JiT-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
5673750 verified
metadata
license: mit
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
  - diffusers
  - jit
  - image-generation
  - class-conditional
widget:
  - output:
      url: demo.png
language:
  - en

JiT-diffusers

Native diffusers implementation of JiT (Just image Transformer). Each variant folder is self-contained:

  • pipeline.py — JiTPipeline
  • scheduler/scheduler_config.json — FlowMatchHeunDiscreteScheduler config (default shift=4.0)
  • transformer/jit_transformer_2d.py — JiTTransformer2DModel

The pipeline now supports dynamic inference resolution in __call__ with positional interpolation.

No separate jit_diffusers package; only PyPI diffusers plus local custom code in the variant directory.

Available checkpoints

Checkpoint Path Resolution Recommended CFG
JiT-B/16 ./JiT-B-16 256×256 3.0
JiT-L/16 ./JiT-L-16 256×256 2.4
JiT-H/16 ./JiT-H-16 256×256 2.2
JiT-B/32 ./JiT-B-32 512×512 3.0
JiT-L/32 ./JiT-L-32 512×512 2.5
JiT-H/32 ./JiT-H-32 512×512 2.3

ImageNet class labels

Each variant keeps an English id2label map directly in its own model_index.json (DiT-style).

  • pipe.id2label — inspect id → English label correspondence
  • pipe.labels — reverse map (English synonym → id), sorted for browsing
  • pipe.get_label_ids("golden retriever")
  • pipe(class_labels="golden retriever", ...) — string labels resolved automatically

Chinese labels are preserved in the main source repo under src/labels/id2label_cn.json for reference.

Inference

Run the bundled demo script from the repo root:

python demo_inference.py

This writes demo.png using JiT-H-32 with the settings below.

from pathlib import Path
from diffusers import DiffusionPipeline, FlowMatchHeunDiscreteScheduler
import torch

model_dir = Path("./JiT-H-32")
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
)
pipe.scheduler = FlowMatchHeunDiscreteScheduler.from_config(pipe.scheduler.config, shift=4.0)
pipe.to("cuda")

# Numeric or human-readable labels
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.3,
    generator=generator,
).images[0]
image.save("demo.png")

height and width default to the checkpoint's native resolution when omitted.

Load a variant subfolder (e.g. ./JiT-H-32), not the repo root.