Upload folder using huggingface_hub

c5fe00d verified 5 days ago

5.05 kB

	---
	license: mit
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- diffusers
	- image-generation
	- class-conditional
	- imagenet
	- pixnerd
	widget:
	- output:
	url: PixNerd-XL-16-512/demo.png
	language:
	- en
	---

	# BiliSakura/PixNerd-diffusers

	Self-contained PixNerd-XL/16 checkpoints for Hugging Face diffusers. No external code repo is required — each subfolder ships its own `pipeline.py`, component modules, and weights.

	This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection), but inference only needs:

	- This model repo (`BiliSakura/PixNerd-diffusers`)
	- PyPI `diffusers`, `torch`, `huggingface_hub`

	This Hugging Face repo hosts multiple self-contained checkpoints as subfolders. Each subfolder includes its own `pipeline.py`, `model_index.json`, weights, and component code (`transformer/`, `scheduler/`).

	## Available checkpoints

	\| Subfolder \| Resolution \| Source checkpoint \|
	\| --- \| --- \| --- \|
	\| [`PixNerd-XL-16-256/`](PixNerd-XL-16-256/) \| 256×256 \| `epoch%3D319-step%3D1600000_emainit.ckpt` \|
	\| [`PixNerd-XL-16-512/`](PixNerd-XL-16-512/) \| 512×512 \| `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` \|

	Both checkpoints are ImageNet class-conditional PixNerd-XL/16 exports with flow-matching sampling.

	## Demo

	![PixNerd-XL-16-512 demo](PixNerd-XL-16-512/demo.png)

	Class 207 — golden retriever, 512×512, 25 steps.

	## ImageNet class labels

	Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).

	- `pipe.id2label` — inspect id → English label correspondence
	- `pipe.labels` — reverse maps (English synonym → id), sorted for browsing
	- `pipe.get_label_ids("golden retriever")`
	- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically
	- `pipe(prompt="golden retriever", ...)` — deprecated alias for `class_labels`

	Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference.

	## Load from Hugging Face

	```python
	import torch
	from diffusers import DiffusionPipeline

	variant = "PixNerd-XL-16-256" # or PixNerd-XL-16-512
	resolution = 256 if variant.endswith("256") else 512

	pipe = DiffusionPipeline.from_pretrained(
	f"BiliSakura/PixNerd-diffusers/{variant}",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	# Scheduler defaults: timeshift=3.0, order=2 (see scheduler/scheduler_config.json)

	images = pipe(
	class_labels="golden retriever",
	height=resolution,
	width=resolution,
	num_inference_steps=25,
	guidance_scale=4.0,
	).images

	print(pipe.id2label[207]) # "golden retriever"
	pipe.get_label_ids("golden retriever") # [207]
	images = pipe(class_labels="golden retriever", height=resolution, width=resolution).images
	```

	## Load from a local clone

	```python
	import torch
	from diffusers import DiffusionPipeline

	repo = "models/BiliSakura/PixNerd-diffusers"
	variant = "PixNerd-XL-16-256"

	pipe = DiffusionPipeline.from_pretrained(
	f"{repo}/{variant}",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	images = pipe(class_labels="golden retriever", height=256, width=256).images
	```

	## Repo layout

	```text
	BiliSakura/PixNerd-diffusers/
	├── README.md
	├── PixNerd-XL-16-256/
	│ ├── README.md
	│ ├── pipeline.py
	│ ├── model_index.json
	│ ├── conversion_metadata.json
	│ ├── transformer/
	│ └── scheduler/
	└── PixNerd-XL-16-512/
	├── README.md
	├── pipeline.py
	├── model_index.json
	├── conversion_metadata.json
	├── transformer/
	└── scheduler/
	```

	## Interface notes

	- The pipeline uses `class_labels` for ImageNet class conditioning (`prompt` remains a deprecated alias).
	- Pass integer ImageNet ids (`prompt=207`) or human-readable synonyms (`prompt="golden retriever"`).
	- `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size (16).
	- Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`.

	## Limitations

	- Intended for ImageNet class-conditional generation.
	- No text encoder is included.
	- Output quality depends on scheduler settings and inference step count.

	## Citation

	Source paper (ICLR 2026):

	- [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268)
	- [Hugging Face Papers page](https://huggingface.co/papers/2507.23268)

	Source code:

	- Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd)
	- Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers)

	```bibtex
	@article{2507.23268,
	Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
	Title = {PixNerd: Pixel Neural Field Diffusion},
	Year = {2025},
	Eprint = {arXiv:2507.23268},
	}
	```