Visual Generation Models
Collection
6 items β’ Updated β’ 1
How to use BiliSakura/PixelFlow-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/PixelFlow-diffusers", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Self-contained PixelFlow checkpoints for Hugging Face diffusers. Each subfolder ships its own pipeline.py, component modules, and weights.
| Subfolder | Task | Resolution | Params |
|---|---|---|---|
PixelFlow-256/ |
class-to-image | 256Γ256 | 677M |
PixelFlow-T2I/ |
text-to-image | 1024Γ1024 | 882M |
For class-conditional PixelFlow-256/, ImageNet-1k labels live in shared labels/ at the repo root:
| File | Direction | Value format |
|---|---|---|
labels/id2label_en.json |
id β English | comma-separated synonyms, e.g. "207": "golden retriever" |
labels/id2label_cn.json |
id β Chinese | comma-separated synonyms, e.g. "207": "ιζ―ηη¬" |
After PixelFlowPipeline.from_pretrained(...), the pipeline exposes:
pipe.id2label / pipe.id2label_cn β inspect id β label correspondencepipe.labels / pipe.labels_cn β reverse maps (synonym β id)pipe.get_label_ids("golden retriever") or pipe.get_label_ids("ιζ―ηη¬", lang="cn")pipe(class_labels="golden retriever", ...) β string labels resolved automaticallyimport sys
from pathlib import Path
repo = Path("BiliSakura/PixelFlow-diffusers").resolve()
variant = "PixelFlow-256"
sys.path.insert(0, str(repo / variant))
from pipeline import PixelFlowPipeline
pipe = PixelFlowPipeline.from_pretrained(".")
pipe.to("cuda")
images = pipe(
class_labels=207,
num_inference_steps=[10, 10, 10, 10],
guidance_scale=4.0,
).images
# Human-readable ImageNet labels (English or Chinese)
print(pipe.id2label[207]) # "golden retriever"
print(pipe.id2label_cn[207]) # "ιζ―ηη¬"
pipe.get_label_ids("golden retriever") # [207]
pipe.get_label_ids("ιζ―ηη¬", lang="cn") # [207]
images = pipe(class_labels="golden retriever", num_inference_steps=[10, 10, 10, 10]).images
PixelFlow-T2I)
Uses google/flan-t5-xl as the text encoder (loaded from Hugging Face at runtime, not bundled in the repo).
variant = "PixelFlow-T2I"
sys.path.insert(0, str(repo / variant))
from pipeline import PixelFlowPipeline
pipe = PixelFlowPipeline.from_pretrained(".")
pipe.to("cuda")
images = pipe(
prompt="A golden retriever playing in a sunny garden",
num_inference_steps=[10, 10, 10, 10],
guidance_scale=4.0,
).images
python scripts/convert_pixelflow_to_diffusers.py \
--checkpoint models/raw/PixelFlow/c2i/model.pt \
--config models/raw/PixelFlow/c2i/config.yaml \
--output models/BiliSakura/PixelFlow-diffusers/PixelFlow-256
python scripts/convert_pixelflow_to_diffusers.py \
--checkpoint models/raw/PixelFlow/t2i/model.pt \
--config models/raw/PixelFlow/t2i/config.yaml \
--output models/BiliSakura/PixelFlow-diffusers/PixelFlow-T2I \
--skip-text-encoder