Unconditional Image Generation
Diffusers
Safetensors
English
sit
image-generation
class-conditional
imagenet
Instructions to use BiliSakura/SiT-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/SiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/SiT-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| library_name: diffusers | |
| pipeline_tag: unconditional-image-generation | |
| tags: | |
| - diffusers | |
| - sit | |
| - image-generation | |
| - class-conditional | |
| - imagenet | |
| license: mit | |
| inference: true | |
| widget: | |
| - output: | |
| url: SiT-XL-2-512/demo.png | |
| language: | |
| - en | |
| # SiT-diffusers | |
| Diffusers-ready checkpoints for **Scalable Interpolant Transformers (SiT)**, converted for local/offline use. | |
| This root folder is a model collection that contains: | |
| - `SiT-S-2-256` | |
| - `SiT-B-2-256` | |
| - `SiT-L-2-256` | |
| - `SiT-XL-2-256` | |
| - `SiT-XL-2-512` | |
| Each subfolder is a self-contained Diffusers model repo with: | |
| - `pipeline.py` | |
| - `transformer/transformer_sit.py` | |
| - `scheduler/scheduler_config.json` (`FlowMatchEulerDiscreteScheduler`) | |
| - `transformer/diffusion_pytorch_model.safetensors` | |
| - `vae/diffusion_pytorch_model.safetensors` | |
| Each variant embeds English `id2label` directly in `model_index.json` (DiT-style), so class labels can be passed as | |
| ImageNet ids or English synonym strings. | |
| ## Demo | |
|  | |
| Class-conditional sample (ImageNet class **207**, golden retriever), `SiT-XL/2` at 512×512, 250 steps, CFG 4.0, seed 0. | |
| ## Model Paths | |
| Use paths relative to this root README: | |
| | Model | Resolution | Local path | | |
| | --- | ---: | --- | | |
| | SiT-S/2 | 256x256 | `./SiT-S-2-256` | | |
| | SiT-B/2 | 256x256 | `./SiT-B-2-256` | | |
| | SiT-L/2 | 256x256 | `./SiT-L-2-256` | | |
| | SiT-XL/2 | 256x256 | `./SiT-XL-2-256` | | |
| | SiT-XL/2 | 512x512 | `./SiT-XL-2-512` | | |
| ## Inference Demo (Diffusers) | |
| ### 1) Load a local subfolder checkpoint | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| model_path = "./SiT-XL-2-512" # change to any path in the table above | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| pipe = DiffusionPipeline.from_pretrained( | |
| model_path, | |
| trust_remote_code=True, | |
| ).to(device) | |
| generator = torch.Generator(device=device).manual_seed(0) | |
| # ImageNet class example: 207 = golden retriever | |
| print(pipe.id2label[207]) | |
| print(pipe.get_label_ids("golden retriever")) # [207] | |
| result = pipe( | |
| class_labels="golden retriever", | |
| height=512, | |
| width=512, | |
| num_inference_steps=250, # official SiT comparisons commonly use 250 steps | |
| guidance_scale=4.0, | |
| generator=generator, | |
| ) | |
| image = result.images[0] | |
| image.save("sit_xl_512_demo.png") | |
| ``` | |
| ### 2) Quick variant switch (256 models) | |
| ```python | |
| model_path = "./SiT-S-2-256" | |
| # model_path = "./SiT-B-2-256" | |
| # model_path = "./SiT-L-2-256" | |
| # model_path = "./SiT-XL-2-256" | |
| pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device) | |
| image = pipe( | |
| class_labels=207, | |
| height=256, | |
| width=256, | |
| num_inference_steps=250, | |
| guidance_scale=4.0, | |
| generator=generator, | |
| ).images[0] | |
| image.save("sit_256_demo.png") | |
| ``` | |