Instructions to use BiliSakura/PixNerd-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/PixNerd-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/PixNerd-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: mit | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - diffusers | |
| - image-generation | |
| - class-conditional | |
| - imagenet | |
| - pixnerd | |
| widget: | |
| - output: | |
| url: PixNerd-XL-16-512/demo.png | |
| language: | |
| - en | |
| # BiliSakura/PixNerd-diffusers | |
| Self-contained PixNerd-XL/16 checkpoints for Hugging Face diffusers. **No external code repo is required** β each subfolder ships its own `pipeline.py`, component modules, and weights. | |
| This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection), but inference only needs: | |
| - This model repo (`BiliSakura/PixNerd-diffusers`) | |
| - PyPI `diffusers`, `torch`, `huggingface_hub` | |
| This Hugging Face repo hosts **multiple self-contained checkpoints as subfolders**. Each subfolder includes its own `pipeline.py`, `model_index.json`, weights, and component code (`transformer/`, `scheduler/`). | |
| ## Available checkpoints | |
| | Subfolder | Resolution | Source checkpoint | | |
| | --- | --- | --- | | |
| | [`PixNerd-XL-16-256/`](PixNerd-XL-16-256/) | 256Γ256 | `epoch%3D319-step%3D1600000_emainit.ckpt` | | |
| | [`PixNerd-XL-16-512/`](PixNerd-XL-16-512/) | 512Γ512 | `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` | | |
| Both checkpoints are ImageNet class-conditional PixNerd-XL/16 exports with flow-matching sampling. | |
| ## Demo | |
|  | |
| Class 207 β golden retriever, 512Γ512, 25 steps. | |
| ## ImageNet class labels | |
| Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style). | |
| - `pipe.id2label` β inspect id β English label correspondence | |
| - `pipe.labels` β reverse maps (English synonym β id), sorted for browsing | |
| - `pipe.get_label_ids("golden retriever")` | |
| - `pipe(class_labels="golden retriever", ...)` β string labels resolved automatically | |
| - `pipe(prompt="golden retriever", ...)` β deprecated alias for `class_labels` | |
| Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference. | |
| ## Load from Hugging Face | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| variant = "PixNerd-XL-16-256" # or PixNerd-XL-16-512 | |
| resolution = 256 if variant.endswith("256") else 512 | |
| pipe = DiffusionPipeline.from_pretrained( | |
| f"BiliSakura/PixNerd-diffusers/{variant}", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| # Scheduler defaults: timeshift=3.0, order=2 (see scheduler/scheduler_config.json) | |
| images = pipe( | |
| class_labels="golden retriever", | |
| height=resolution, | |
| width=resolution, | |
| num_inference_steps=25, | |
| guidance_scale=4.0, | |
| ).images | |
| print(pipe.id2label[207]) # "golden retriever" | |
| pipe.get_label_ids("golden retriever") # [207] | |
| images = pipe(class_labels="golden retriever", height=resolution, width=resolution).images | |
| ``` | |
| ## Load from a local clone | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| repo = "models/BiliSakura/PixNerd-diffusers" | |
| variant = "PixNerd-XL-16-256" | |
| pipe = DiffusionPipeline.from_pretrained( | |
| f"{repo}/{variant}", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| images = pipe(class_labels="golden retriever", height=256, width=256).images | |
| ``` | |
| ## Repo layout | |
| ```text | |
| BiliSakura/PixNerd-diffusers/ | |
| βββ README.md | |
| βββ PixNerd-XL-16-256/ | |
| β βββ README.md | |
| β βββ pipeline.py | |
| β βββ model_index.json | |
| β βββ conversion_metadata.json | |
| β βββ transformer/ | |
| β βββ scheduler/ | |
| βββ PixNerd-XL-16-512/ | |
| βββ README.md | |
| βββ pipeline.py | |
| βββ model_index.json | |
| βββ conversion_metadata.json | |
| βββ transformer/ | |
| βββ scheduler/ | |
| ``` | |
| ## Interface notes | |
| - The pipeline uses `class_labels` for ImageNet class conditioning (`prompt` remains a deprecated alias). | |
| - Pass integer ImageNet ids (`prompt=207`) or human-readable synonyms (`prompt="golden retriever"`). | |
| - `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size (16). | |
| - Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`. | |
| ## Limitations | |
| - Intended for ImageNet class-conditional generation. | |
| - No text encoder is included. | |
| - Output quality depends on scheduler settings and inference step count. | |
| ## Citation | |
| Source paper (ICLR 2026): | |
| - [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268) | |
| - [Hugging Face Papers page](https://huggingface.co/papers/2507.23268) | |
| Source code: | |
| - Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd) | |
| - Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers) | |
| ```bibtex | |
| @article{2507.23268, | |
| Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang}, | |
| Title = {PixNerd: Pixel Neural Field Diffusion}, | |
| Year = {2025}, | |
| Eprint = {arXiv:2507.23268}, | |
| } | |
| ``` | |