Instructions to use GD-ML/Code2World with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use GD-ML/Code2World with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="GD-ML/Code2World") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("GD-ML/Code2World") model = AutoModelForImageTextToText.from_pretrained("GD-ML/Code2World") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use GD-ML/Code2World with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "GD-ML/Code2World" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GD-ML/Code2World", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/GD-ML/Code2World
- SGLang
How to use GD-ML/Code2World with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "GD-ML/Code2World" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GD-ML/Code2World", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "GD-ML/Code2World" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GD-ML/Code2World", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use GD-ML/Code2World with Docker Model Runner:
docker model run hf.co/GD-ML/Code2World
| import json | |
| SYSTEM_PROMPT = """You are an expert **UI State Transition Simulator** and **Frontend Developer**. | |
| Your task is to predict the **NEXT UI STATE** based on a screenshot of the current state and a user interaction. | |
| ### 1. IMAGE INTERPRETATION RULES | |
| The input image contains visual cues denoting the user's action. You must interpret them as follows: | |
| * **Red Circle**: Indicates a **Click** or **Long Press** target at that location. | |
| * **Red Arrow**: Indicates a **Scroll** or **Swipe**. | |
| * The arrow points in the direction of finger movement. | |
| * *Example*: An arrow pointing UP means the finger slides up, pushing content up (Scrolling Down). | |
| * **Note**: These cues exist ONLY to show the action. **DO NOT render these red circles or arrows in your output HTML.** | |
| ### 2. CRITICAL STRUCTURAL RULES (MUST FOLLOW) | |
| * **Format**: Output ONLY raw HTML. Start with `<!DOCTYPE html>` and end with `</html>`. | |
| * **Root Element**: All visible content MUST be wrapped in: | |
| `<div id="render-target"> ... </div>` | |
| * **Container Style**: `#render-target` must have: | |
| `width: 1080px; height: 2400px; position: relative; overflow: hidden;` | |
| (Apply background colors and shadows here, NOT on the body). | |
| * **Body Style**: The `<body>` tag must have `margin: 0; padding: 0; background: transparent;`. | |
| * **Layout**: Do NOT center the body. Let `#render-target` sit at (0,0). | |
| ### 3. CONTENT GENERATION LOGIC | |
| * **Transition**: Analyze the action. If the user clicks a button, show the *result* (e.g., a menu opens, a checkbox checks, page navigates). | |
| * **Images**: Use semantic text placeholders. DO NOT use real URLs. | |
| * Format: `<div style="...">[IMG: description]</div>` | |
| * **Icons**: Use simple inline SVG paths or Unicode. | |
| ### 4. OUTPUT REQUIREMENT | |
| * Do NOT generate Markdown blocks (```html). | |
| * Do NOT provide explanations or conversational text. | |
| * Output the code directly. | |
| """ | |
| USER_PROMPT_TEMPLATE = """<image> | |
| ### INPUT CONTEXT | |
| 1. **User Intent**: "{instruction_str}" | |
| 2. **Interaction Details**: | |
| * **Description**: {semantic_desc} | |
| * **Action Data**: {action_json} | |
| ### COMMAND | |
| Based on the visual cues in the image and the interaction data above, generate the **HTML for the RESULTING UI STATE** (what the screen looks like *after* this action). | |
| """ | |
| def get_action_semantic_description(action): | |
| action_type = action.get("action_type") | |
| if action_type == "click": | |
| x, y = action.get("x"), action.get("y") | |
| return ( | |
| f"User performed a CLICK at coordinates ({x}, {y}). " | |
| f"Expect the button/element at this location to trigger." | |
| ) | |
| if action_type == "long_press": | |
| x, y = action.get("x"), action.get("y") | |
| return ( | |
| f"User performed a LONG PRESS at coordinates ({x}, {y}). " | |
| f"Expect a context menu or selection state." | |
| ) | |
| if action_type in ["scroll", "swipe"]: | |
| direction = action.get("direction", "down") | |
| return ( | |
| f"User SCROLLED {direction.upper()}. " | |
| f"The content should move, revealing new items from the {direction} direction." | |
| ) | |
| if action_type == "input_text": | |
| text = action.get("text", "") | |
| return ( | |
| f"User is TYPING the text: '{text}'. " | |
| f"The focused input field MUST now contain this text." | |
| ) | |
| if action_type == "open_app": | |
| app_name = action.get("app_name", "app") | |
| return ( | |
| f"System Context Switch: The user opened the app '{app_name}'. " | |
| f"Show the home screen of this app." | |
| ) | |
| if action_type == "navigate_back": | |
| return "System Navigation: The user pressed BACK. Return to the previous screen." | |
| if action_type == "navigate_home": | |
| return "System Navigation: The user pressed HOME. Show the Desktop." | |
| if action_type == "wait": | |
| return "Action: WAIT. Keep the UI mostly unchanged unless loading completes." | |
| return f"Perform action: {action_type}." | |
| def build_user_prompt(instruction_str, action, semantic_desc=None): | |
| if semantic_desc is None: | |
| semantic_desc = get_action_semantic_description(action) | |
| action_json = json.dumps(action, ensure_ascii=False) | |
| return USER_PROMPT_TEMPLATE.format( | |
| instruction_str=instruction_str, | |
| semantic_desc=semantic_desc, | |
| action_json=action_json, | |
| ) |