Spaces:

debaa
/

Promptforge-Game

Configuration error

App Files Files Community

Update README.md

by debaa - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-1

debaa

Owner 2 days ago

Models & services
LayerModel / ServiceRoleGame logicCodex / GPT-5.5 (OpenAI)Stat block + behavior code generationConcept artFLUX.2 Klein 9B (Black Forest Labs)Multi-angle reference image from prompt3D generationHunyuan3D-2.1 (Tencent, 32B)PBR mesh from image-conditioned inputComputeModal (serverless GPU)Autoscaling inference — no cold-start painFrontendThree.js + GradioBrowser game loop + real-time 3D viewerSandboxModal SandboxesSafe execution of Codex-generated game code
Pipeline diagram
User Prompt
│
├──► [Codex / GPT-5.5]
│ │
│ └──► Stat Block JSON ──────────────────────┐
│ Behavior Code ──► Modal Sandbox ──── │
│ │
├──► [FLUX.2 Klein 9B] ──► Concept Image (512×512) │
│ │ │ │
│ └──────────────────────► [Hunyuan3D-2.1] │
│ │ │
│ PBR Mesh (GLB) │
│ │ │
└────────────────────────────────────────┴───────────────┘
│
Three.js Game Scene
(orbit / inspect / play)

Architecture deep-dive

Prompt ingestion & enrichment
The raw user prompt is passed through a lightweight preprocessing step that:

Injects RPG-specific context tokens (ITEM:, CHARACTER:, ENVIRONMENT:)
Detects asset type (weapon / character / prop / environment) via keyword classification
Expands vague descriptors using a small synonym/adjective bank (e.g. "magic sword" → "enchanted longsword with runic inscriptions and faint blue aura")

No separate LLM call needed — this runs client-side in Python with a 200-line rule engine.

Codex / GPT-5.5 — game logic generation
Endpoint: POST https://api.openai.com/v1/responses (Codex agent mode)

Plugin: Hugging Face plugin for asset lookup; GitHub plugin for stat template retrieval
The prompt is structured as:
pythonsystem = """
You are a tabletop RPG game designer.
Given an item/character description, output ONLY valid JSON:
{
"name": str,
"type": "weapon" | "character" | "environment",
"stats": { "atk": int, "def": int, "spd": int, "mag": int },
"abilities": [{"name": str, "description": str, "cost": int}],
"lore": str (1 sentence),
"behavior_code": str (JavaScript, Three.js compatible)
}
"""
behavior_code is a self-contained JS function that defines how the asset animates or responds to player interaction in the Three.js scene. It is executed inside a Modal Sandbox (isolated container) before being injected into the browser — preventing arbitrary code execution on the client.

FLUX.2 Klein 9B — concept art
Model: black-forest-labs/FLUX.2-Klein-distilled-9B

Hosted on: Modal A10G GPU (cold start ~4s, inference ~8s)
python@modal.function(gpu="A10G", image=flux_image)
def generate_concept(prompt: str) -> bytes:
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.2-Klein-distilled-9B")
prompt_full = f"RPG game asset concept art, {prompt}, front view, clean white background, detailed, 4K"
image = pipe(prompt_full, num_inference_steps=20, guidance_scale=3.5).images[0]
return image_to_bytes(image)
The image is returned as a 512×512 PNG and displayed in the Gradio UI immediately — so the user sees concept art while 3D generation runs in parallel.

Hunyuan3D-2.1 — 3D mesh generation
Model: tencent/Hunyuan3D-2.1 (32B, image-conditioned mode)

Hosted on: Modal A100 GPU (80GB) — image-conditioned path is faster than text-only

Output: GLB with PBR maps (albedo, roughness, metallic, normal)
python@modal.function(gpu="A100", image=hunyuan_image, timeout=120)
def generate_3d(concept_image_bytes: bytes, prompt: str) -> bytes:
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline
from hy3dgen.texgen import Hunyuan3DPaintPipeline

shape_pipe = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained("tencent/Hunyuan3D-2.1")
tex_pipe   = Hunyuan3DPaintPipeline.from_pretrained("tencent/Hunyuan3D-2.1")

image = load_image(concept_image_bytes)
mesh  = shape_pipe(image=image, prompt=prompt, num_inference_steps=30)
mesh  = tex_pipe(mesh)          # applies PBR texture bake
return export_glb(mesh)         # returns GLB bytes

Using the concept image as conditioning (rather than raw text) consistently produces cleaner topology and better texture alignment — this is the key quality unlock vs. text-only 3D generation.

Three.js browser scene
The GLB is loaded via THREE.GLTFLoader into a minimal browser game loop:
javascript// Injected into the Gradio HTML component
const loader = new THREE.GLTFLoader();
loader.load(assetUrl, (gltf) => {
scene.add(gltf.scene);
// Run sandboxed behavior code
const behaviorFn = new Function('scene', 'asset', behaviorCode);
behaviorFn(scene, gltf.scene);
});
The scene includes:

Orbit controls (rotate / zoom / pan)
PBR environment lighting (HDR studio preset)
Stat card overlay (HTML positioned over the canvas)
GLB download button
"Add to party" button — persists the asset to session state for multi-asset scenes

Modal Sandboxes — safe code execution
Codex-generated behavior_code is never executed directly in the browser. It runs through a Modal Sandbox first:
python@modal.function()
def validate_behavior_code(code: str) -> dict:
sandbox = modal.Sandbox.create(
"python:3.11-slim",
timeout=10,
network_access=False, # no outbound calls
)
Static analysis + dry-run
result = sandbox.exec("python", "-c", f"import ast; ast.parse({repr(code)})")
sandbox.terminate()
return {"safe": result.returncode == 0, "code": code}
Only validated code reaches the client. This keeps the sandbox prize track happy and prevents XSS via generated game logic.

Repository structure
promptforge-rpg/
├── app.py # Gradio entrypoint
├── pipeline/
│ ├── prompt_enricher.py # Rule-based prompt preprocessing
│ ├── codex_agent.py # GPT-5.5 stat block + code generation
│ ├── flux_gen.py # FLUX.2 Klein concept art (Modal)
│ ├── hunyuan_gen.py # Hunyuan3D-2.1 mesh generation (Modal)
│ └── sandbox.py # Modal Sandbox behavior code validation
├── frontend/
│ ├── scene.js # Three.js game scene
│ ├── stat_card.js # Stat block overlay component
│ └── index.html # Injected into Gradio HTML block
├── modal_stubs/
│ ├── flux_stub.py # Modal function definitions (FLUX)
│ └── hunyuan_stub.py # Modal function definitions (Hunyuan3D)
├── tests/
│ ├── test_pipeline.py
│ └── test_sandbox.py
├── requirements.txt
└── README.md # ← you are here

Quickstart

Clone and install
bashgit clone https://huggingface.co/spaces//promptforge-rpg
cd promptforge-rpg
pip install -r requirements.txt
Configure secrets
In your HF Space settings → Repository secrets, add:
SecretValueOPENAI_API_KEYYour OpenAI key (Codex / GPT-5.5)MODAL_TOKEN_IDModal token IDMODAL_TOKEN_SECRETModal token secret
Deploy Modal functions
bashmodal deploy modal_stubs/flux_stub.py
modal deploy modal_stubs/hunyuan_stub.py
Launch locally
bashpython app.py

→ http://localhost:7860

Push to HF Space
bashgit add .
git commit -m "initial deploy"
git push

API reference
POST /generate — full pipeline
json{
"prompt": "a rusted mace with bone spikes dripping black ichor",
"asset_type": "weapon", // optional — auto-detected if omitted
"style": "dark fantasy", // optional — defaults to "fantasy"
"output_format": "glb" // "glb" | "obj" | "usdz"
}
Response:
json{
"name": "Bonecrusher's Blight",
"stats": { "atk": 18, "def": 4, "spd": 6, "mag": 2 },
"abilities": [
{ "name": "Ichor Burst", "description": "Poisons on hit for 3 turns", "cost": 2 }
],
"lore": "Forged in the marrow pits beneath the Ashfeld Fortress.",
"concept_art_url": "https://.../concept.png",
"model_url": "https://.../asset.glb",
"behavior_code": "function animate(scene, asset) { ... }"
}

Performance benchmarks
StepGPUTimePrompt enrichmentCPU~~0.1sCodex stat blockAPI~~2–4sFLUX.2 Klein concept artA10G~~8–12sHunyuan3D-2.1 meshA100 (80GB)~~35–55sThree.js scene loadBrowser~~1–2sEnd-to-end—~~45–70s
FLUX and Hunyuan3D run in parallel after the stat block is returned, so the user sees the concept art at ~12s and the 3D model arrives ~40s later.

Prize eligibility
TrackPartnerQualifierBest Use of ModalModalInference + training + Sandboxes all usedCodex / OpenAI trackOpenAIGPT-5.5 Codex agent with HF + GitHub pluginsBest FLUX Build (if nominated)Black Forest LabsFLUX.2 Klein 9B for concept image generation

Known limitations & roadmap
Current limitations:

Characters with complex rigs (humanoids) produce lower-quality topology than props/weapons — image-conditioned Hunyuan3D works best on objects
Behavior code sandbox validation adds ~3s latency
Multi-asset party scenes (3+ meshes) can drop below 30fps in-browser on integrated GPU

Roadmap (post-hackathon):

Fine-tune FLUX.2 Klein on RPG concept art LoRA (ai-toolkit)
Add MiniCPM-V 4.6 for sketch-to-3D input path
Persist party to IndexedDB for multi-session campaigns
Export full scene as .zip (GLBs + stat JSONs + behavior scripts)
Multiplayer lobby via HF Spaces Persistent Storage

License
Apache 2.0 — models used are Apache 2.0 (FLUX.2 Klein, Hunyuan3D-2.1) or accessed via API (Codex/GPT-5.5, Modal).

Update README.md94f0f271

debaa changed pull request status to merged 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Update README.md

Static analysis + dry-run

→ http://localhost:7860