Update README.md

#2
by debaa - opened

Models & services
LayerModel / ServiceRoleGame logicCodex / GPT-5.5 (OpenAI)Stat block + behavior code generationConcept artFLUX.2 Klein 9B (Black Forest Labs)Multi-angle reference image from prompt3D generationHunyuan3D-2.1 (Tencent, 32B)PBR mesh from image-conditioned inputComputeModal (serverless GPU)Autoscaling inference β€” no cold-start painFrontendThree.js + GradioBrowser game loop + real-time 3D viewerSandboxModal SandboxesSafe execution of Codex-generated game code
Pipeline diagram
User Prompt
β”‚
β”œβ”€β”€β–Ί [Codex / GPT-5.5]
β”‚ β”‚
β”‚ └──► Stat Block JSON ──────────────────────┐
β”‚ Behavior Code ──► Modal Sandbox ──── β”‚
β”‚ β”‚
β”œβ”€β”€β–Ί [FLUX.2 Klein 9B] ──► Concept Image (512Γ—512) β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ └──────────────────────► [Hunyuan3D-2.1] β”‚
β”‚ β”‚ β”‚
β”‚ PBR Mesh (GLB) β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
Three.js Game Scene
(orbit / inspect / play)

Architecture deep-dive

  1. Prompt ingestion & enrichment
    The raw user prompt is passed through a lightweight preprocessing step that:

Injects RPG-specific context tokens (ITEM:, CHARACTER:, ENVIRONMENT:)
Detects asset type (weapon / character / prop / environment) via keyword classification
Expands vague descriptors using a small synonym/adjective bank (e.g. "magic sword" β†’ "enchanted longsword with runic inscriptions and faint blue aura")

No separate LLM call needed β€” this runs client-side in Python with a 200-line rule engine.

  1. Codex / GPT-5.5 β€” game logic generation
    Endpoint: POST https://api.openai.com/v1/responses (Codex agent mode)

Plugin: Hugging Face plugin for asset lookup; GitHub plugin for stat template retrieval
The prompt is structured as:
pythonsystem = """
You are a tabletop RPG game designer.
Given an item/character description, output ONLY valid JSON:
{
"name": str,
"type": "weapon" | "character" | "environment",
"stats": { "atk": int, "def": int, "spd": int, "mag": int },
"abilities": [{"name": str, "description": str, "cost": int}],
"lore": str (1 sentence),
"behavior_code": str (JavaScript, Three.js compatible)
}
"""
behavior_code is a self-contained JS function that defines how the asset animates or responds to player interaction in the Three.js scene. It is executed inside a Modal Sandbox (isolated container) before being injected into the browser β€” preventing arbitrary code execution on the client.

  1. FLUX.2 Klein 9B β€” concept art
    Model: black-forest-labs/FLUX.2-Klein-distilled-9B

Hosted on: Modal A10G GPU (cold start ~4s, inference ~8s)
python@modal.function(gpu="A10G", image=flux_image)
def generate_concept(prompt: str) -> bytes:
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-Klein-distilled-9B")
prompt_full = f"RPG game asset concept art, {prompt}, front view, clean white background, detailed, 4K"
image = pipe(prompt_full, num_inference_steps=20, guidance_scale=3.5).images[0]
return image_to_bytes(image)
The image is returned as a 512Γ—512 PNG and displayed in the Gradio UI immediately β€” so the user sees concept art while 3D generation runs in parallel.

  1. Hunyuan3D-2.1 β€” 3D mesh generation
    Model: tencent/Hunyuan3D-2.1 (32B, image-conditioned mode)

Hosted on: Modal A100 GPU (80GB) β€” image-conditioned path is faster than text-only

Output: GLB with PBR maps (albedo, roughness, metallic, normal)
python@modal.function(gpu="A100", image=hunyuan_image, timeout=120)
def generate_3d(concept_image_bytes: bytes, prompt: str) -> bytes:
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
from hy3dgen.texgen import Hunyuan3DPaintPipeline

shape_pipe = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained("tencent/Hunyuan3D-2.1")
tex_pipe   = Hunyuan3DPaintPipeline.from_pretrained("tencent/Hunyuan3D-2.1")

image = load_image(concept_image_bytes)
mesh  = shape_pipe(image=image, prompt=prompt, num_inference_steps=30)
mesh  = tex_pipe(mesh)          # applies PBR texture bake
return export_glb(mesh)         # returns GLB bytes

Using the concept image as conditioning (rather than raw text) consistently produces cleaner topology and better texture alignment β€” this is the key quality unlock vs. text-only 3D generation.

  1. Three.js browser scene
    The GLB is loaded via THREE.GLTFLoader into a minimal browser game loop:
    javascript// Injected into the Gradio HTML component
    const loader = new THREE.GLTFLoader();
    loader.load(assetUrl, (gltf) => {
    scene.add(gltf.scene);
    // Run sandboxed behavior code
    const behaviorFn = new Function('scene', 'asset', behaviorCode);
    behaviorFn(scene, gltf.scene);
    });
    The scene includes:

Orbit controls (rotate / zoom / pan)
PBR environment lighting (HDR studio preset)
Stat card overlay (HTML positioned over the canvas)
GLB download button
"Add to party" button β€” persists the asset to session state for multi-asset scenes

  1. Modal Sandboxes β€” safe code execution
    Codex-generated behavior_code is never executed directly in the browser. It runs through a Modal Sandbox first:
    python@modal.function()
    def validate_behavior_code(code: str) -> dict:
    sandbox = modal.Sandbox.create(
    "python:3.11-slim",
    timeout=10,
    network_access=False, # no outbound calls
    )

    Static analysis + dry-run

    result = sandbox.exec("python", "-c", f"import ast; ast.parse({repr(code)})")
    sandbox.terminate()
    return {"safe": result.returncode == 0, "code": code}
    Only validated code reaches the client. This keeps the sandbox prize track happy and prevents XSS via generated game logic.

Repository structure
promptforge-rpg/
β”œβ”€β”€ app.py # Gradio entrypoint
β”œβ”€β”€ pipeline/
β”‚ β”œβ”€β”€ prompt_enricher.py # Rule-based prompt preprocessing
β”‚ β”œβ”€β”€ codex_agent.py # GPT-5.5 stat block + code generation
β”‚ β”œβ”€β”€ flux_gen.py # FLUX.2 Klein concept art (Modal)
β”‚ β”œβ”€β”€ hunyuan_gen.py # Hunyuan3D-2.1 mesh generation (Modal)
β”‚ └── sandbox.py # Modal Sandbox behavior code validation
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ scene.js # Three.js game scene
β”‚ β”œβ”€β”€ stat_card.js # Stat block overlay component
β”‚ └── index.html # Injected into Gradio HTML block
β”œβ”€β”€ modal_stubs/
β”‚ β”œβ”€β”€ flux_stub.py # Modal function definitions (FLUX)
β”‚ └── hunyuan_stub.py # Modal function definitions (Hunyuan3D)
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ test_pipeline.py
β”‚ └── test_sandbox.py
β”œβ”€β”€ requirements.txt
└── README.md # ← you are here

Quickstart

  1. Clone and install
    bashgit clone https://huggingface.co/spaces//promptforge-rpg
    cd promptforge-rpg
    pip install -r requirements.txt
  2. Configure secrets
    In your HF Space settings β†’ Repository secrets, add:
    SecretValueOPENAI_API_KEYYour OpenAI key (Codex / GPT-5.5)MODAL_TOKEN_IDModal token IDMODAL_TOKEN_SECRETModal token secret
  3. Deploy Modal functions
    bashmodal deploy modal_stubs/flux_stub.py
    modal deploy modal_stubs/hunyuan_stub.py
  4. Launch locally
    bashpython app.py

β†’ http://localhost:7860

  1. Push to HF Space
    bashgit add .
    git commit -m "initial deploy"
    git push

API reference
POST /generate β€” full pipeline
json{
"prompt": "a rusted mace with bone spikes dripping black ichor",
"asset_type": "weapon", // optional β€” auto-detected if omitted
"style": "dark fantasy", // optional β€” defaults to "fantasy"
"output_format": "glb" // "glb" | "obj" | "usdz"
}
Response:
json{
"name": "Bonecrusher's Blight",
"stats": { "atk": 18, "def": 4, "spd": 6, "mag": 2 },
"abilities": [
{ "name": "Ichor Burst", "description": "Poisons on hit for 3 turns", "cost": 2 }
],
"lore": "Forged in the marrow pits beneath the Ashfeld Fortress.",
"concept_art_url": "https://.../concept.png",
"model_url": "https://.../asset.glb",
"behavior_code": "function animate(scene, asset) { ... }"
}

Performance benchmarks
StepGPUTimePrompt enrichmentCPU0.1sCodex stat blockAPI2–4sFLUX.2 Klein concept artA10G8–12sHunyuan3D-2.1 meshA100 (80GB)35–55sThree.js scene loadBrowser1–2sEnd-to-endβ€”45–70s
FLUX and Hunyuan3D run in parallel after the stat block is returned, so the user sees the concept art at ~12s and the 3D model arrives ~40s later.

Prize eligibility
TrackPartnerQualifierBest Use of ModalModalInference + training + Sandboxes all usedCodex / OpenAI trackOpenAIGPT-5.5 Codex agent with HF + GitHub pluginsBest FLUX Build (if nominated)Black Forest LabsFLUX.2 Klein 9B for concept image generation

Known limitations & roadmap
Current limitations:

Characters with complex rigs (humanoids) produce lower-quality topology than props/weapons β€” image-conditioned Hunyuan3D works best on objects
Behavior code sandbox validation adds ~3s latency
Multi-asset party scenes (3+ meshes) can drop below 30fps in-browser on integrated GPU

Roadmap (post-hackathon):

Fine-tune FLUX.2 Klein on RPG concept art LoRA (ai-toolkit)
Add MiniCPM-V 4.6 for sketch-to-3D input path
Persist party to IndexedDB for multi-session campaigns
Export full scene as .zip (GLBs + stat JSONs + behavior scripts)
Multiplayer lobby via HF Spaces Persistent Storage

License
Apache 2.0 β€” models used are Apache 2.0 (FLUX.2 Klein, Hunyuan3D-2.1) or accessed via API (Codex/GPT-5.5, Modal).

debaa changed pull request status to merged

Sign up or log in to comment