PixNerd-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
c5fe00d verified
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- image-generation
- class-conditional
- imagenet
- pixnerd
widget:
- output:
url: PixNerd-XL-16-512/demo.png
language:
- en
---
# BiliSakura/PixNerd-diffusers
Self-contained PixNerd-XL/16 checkpoints for Hugging Face diffusers. **No external code repo is required** β€” each subfolder ships its own `pipeline.py`, component modules, and weights.
This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection), but inference only needs:
- This model repo (`BiliSakura/PixNerd-diffusers`)
- PyPI `diffusers`, `torch`, `huggingface_hub`
This Hugging Face repo hosts **multiple self-contained checkpoints as subfolders**. Each subfolder includes its own `pipeline.py`, `model_index.json`, weights, and component code (`transformer/`, `scheduler/`).
## Available checkpoints
| Subfolder | Resolution | Source checkpoint |
| --- | --- | --- |
| [`PixNerd-XL-16-256/`](PixNerd-XL-16-256/) | 256Γ—256 | `epoch%3D319-step%3D1600000_emainit.ckpt` |
| [`PixNerd-XL-16-512/`](PixNerd-XL-16-512/) | 512Γ—512 | `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` |
Both checkpoints are ImageNet class-conditional PixNerd-XL/16 exports with flow-matching sampling.
## Demo
![PixNerd-XL-16-512 demo](PixNerd-XL-16-512/demo.png)
Class 207 β€” golden retriever, 512Γ—512, 25 steps.
## ImageNet class labels
Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).
- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse maps (English synonym β†’ id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically
- `pipe(prompt="golden retriever", ...)` β€” deprecated alias for `class_labels`
Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference.
## Load from Hugging Face
```python
import torch
from diffusers import DiffusionPipeline
variant = "PixNerd-XL-16-256" # or PixNerd-XL-16-512
resolution = 256 if variant.endswith("256") else 512
pipe = DiffusionPipeline.from_pretrained(
f"BiliSakura/PixNerd-diffusers/{variant}",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
# Scheduler defaults: timeshift=3.0, order=2 (see scheduler/scheduler_config.json)
images = pipe(
class_labels="golden retriever",
height=resolution,
width=resolution,
num_inference_steps=25,
guidance_scale=4.0,
).images
print(pipe.id2label[207]) # "golden retriever"
pipe.get_label_ids("golden retriever") # [207]
images = pipe(class_labels="golden retriever", height=resolution, width=resolution).images
```
## Load from a local clone
```python
import torch
from diffusers import DiffusionPipeline
repo = "models/BiliSakura/PixNerd-diffusers"
variant = "PixNerd-XL-16-256"
pipe = DiffusionPipeline.from_pretrained(
f"{repo}/{variant}",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
images = pipe(class_labels="golden retriever", height=256, width=256).images
```
## Repo layout
```text
BiliSakura/PixNerd-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ PixNerd-XL-16-256/
β”‚ β”œβ”€β”€ README.md
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ model_index.json
β”‚ β”œβ”€β”€ conversion_metadata.json
β”‚ β”œβ”€β”€ transformer/
β”‚ └── scheduler/
└── PixNerd-XL-16-512/
β”œβ”€β”€ README.md
β”œβ”€β”€ pipeline.py
β”œβ”€β”€ model_index.json
β”œβ”€β”€ conversion_metadata.json
β”œβ”€β”€ transformer/
└── scheduler/
```
## Interface notes
- The pipeline uses `class_labels` for ImageNet class conditioning (`prompt` remains a deprecated alias).
- Pass integer ImageNet ids (`prompt=207`) or human-readable synonyms (`prompt="golden retriever"`).
- `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size (16).
- Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`.
## Limitations
- Intended for ImageNet class-conditional generation.
- No text encoder is included.
- Output quality depends on scheduler settings and inference step count.
## Citation
Source paper (ICLR 2026):
- [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268)
- [Hugging Face Papers page](https://huggingface.co/papers/2507.23268)
Source code:
- Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd)
- Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers)
```bibtex
@article{2507.23268,
Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
Title = {PixNerd: Pixel Neural Field Diffusion},
Year = {2025},
Eprint = {arXiv:2507.23268},
}
```