Visual Generation Models
Collection
6 items • Updated • 1
How to use BiliSakura/SiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/SiT-diffusers", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Diffusers-ready checkpoints for Scalable Interpolant Transformers (SiT), converted for local/offline use.
This root folder is a model collection that contains:
SiT-S-2-256SiT-B-2-256SiT-L-2-256SiT-XL-2-256SiT-XL-2-512Each subfolder is a self-contained Diffusers model repo with:
pipeline.pytransformer/transformer_sit.pyscheduler/scheduler_config.json (FlowMatchEulerDiscreteScheduler)transformer/diffusion_pytorch_model.safetensorsvae/diffusion_pytorch_model.safetensorsEach variant embeds English id2label directly in model_index.json (DiT-style), so class labels can be passed as
ImageNet ids or English synonym strings.
Class-conditional sample (ImageNet class 207, golden retriever), SiT-XL/2 at 512×512, 250 steps, CFG 4.0, seed 0.
Use paths relative to this root README:
| Model | Resolution | Local path |
|---|---|---|
| SiT-S/2 | 256x256 | ./SiT-S-2-256 |
| SiT-B/2 | 256x256 | ./SiT-B-2-256 |
| SiT-L/2 | 256x256 | ./SiT-L-2-256 |
| SiT-XL/2 | 256x256 | ./SiT-XL-2-256 |
| SiT-XL/2 | 512x512 | ./SiT-XL-2-512 |
import torch
from diffusers import DiffusionPipeline
model_path = "./SiT-XL-2-512" # change to any path in the table above
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained(
model_path,
trust_remote_code=True,
).to(device)
generator = torch.Generator(device=device).manual_seed(0)
# ImageNet class example: 207 = golden retriever
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever")) # [207]
result = pipe(
class_labels="golden retriever",
height=512,
width=512,
num_inference_steps=250, # official SiT comparisons commonly use 250 steps
guidance_scale=4.0,
generator=generator,
)
image = result.images[0]
image.save("sit_xl_512_demo.png")
model_path = "./SiT-S-2-256"
# model_path = "./SiT-B-2-256"
# model_path = "./SiT-L-2-256"
# model_path = "./SiT-XL-2-256"
pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device)
image = pipe(
class_labels=207,
height=256,
width=256,
num_inference_steps=250,
guidance_scale=4.0,
generator=generator,
).images[0]
image.save("sit_256_demo.png")