Instructions to use GSAI-ML/LLaDA-1.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use GSAI-ML/LLaDA-1.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="GSAI-ML/LLaDA-1.5", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-1.5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use GSAI-ML/LLaDA-1.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "GSAI-ML/LLaDA-1.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/LLaDA-1.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/GSAI-ML/LLaDA-1.5
- SGLang
How to use GSAI-ML/LLaDA-1.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "GSAI-ML/LLaDA-1.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/LLaDA-1.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "GSAI-ML/LLaDA-1.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "GSAI-ML/LLaDA-1.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use GSAI-ML/LLaDA-1.5 with Docker Model Runner:
docker model run hf.co/GSAI-ML/LLaDA-1.5
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
We introduce LLaDA 1.5, a competitive large diffusion language model, trained by variance-reduced preference optimization (VRPO), as presented in the paper LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models.
Compared with LLaDA-8B-Instruct, LLaDA 1.5 achieves better performance on a wide range of tasks, including Math, Code, and Alignment tasks.
Inference
The LLaDA 1.5 model is available on Huggingface. Please employ the transformers to load.
from transformers import AutoModel, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/LLaDA-1.5', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/LLaDA-1.5', trust_remote_code=True, torch_dtype=torch.bfloat16)
The model is based on LLaDA-8B-Instruct, you can use the code for LLaDA-8B-Instruct to inference.
Citation
Please consider cite:
@article{zhu2025llada,
title={LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models},
author={Zhu, Fengqi and Wang, Rongzhen and Nie, Shen and Zhang, Xiaolu and Wu, Chunwei and Hu, Jun and Zhou, Jun and Chen, Jianfei and Lin, Yankai and Wen, Ji-Rong and others},
journal={arXiv preprint arXiv:2505.19223},
year={2025}
}
- Downloads last month
- 26,034