| | --- |
| | pipeline_tag: text-to-image |
| | inference: false |
| | library_name: tensorrt |
| | license: other |
| | license_name: stabilityai-nc-research-community |
| | license_link: LICENSE |
| | tags: |
| | - tensorrt |
| | - sd3 |
| | - sd3-medium |
| | - text-to-image |
| | - onnx |
| | extra_gated_prompt: >- |
| | By clicking "Agree", you agree to the [License |
| | Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE) |
| | and acknowledge Stability AI's [Privacy |
| | Policy](https://stability.ai/privacy-policy). |
| | extra_gated_fields: |
| | Name: text |
| | Email: text |
| | Country: country |
| | Organization or Affiliation: text |
| | Receive email updates and promotions on Stability AI products, services, and research?: |
| | type: select |
| | options: |
| | - 'Yes' |
| | - 'No' |
| | I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox |
| | language: |
| | - en |
| | --- |
| | |
| | # Stable Diffusion 3 Medium TensorRT |
| | ## Introduction |
| |
|
| | This repository hosts the TensorRT version of **Stable Diffusion 3 Medium** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency. |
| |
|
| | Stable Diffusion 3 Medium is a fast generative text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | Stable Diffusion 3 Medium combines a diffusion transformer architecture and flow matching. |
| |
|
| | - **Developed by:** Stability AI |
| | - **Model type:** MMDiT text-to-image model |
| | - **Model Description:** This is a conversion of the [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) model |
| |
|
| |
|
| | ## Performance using TensorRT 10.1 |
| | #### Timings for 50 steps at 1024x1024 |
| |
|
| | | Accelerator | CLIP-G | CLIP-L | T5XXL | MMDiT | VAE Decoder | Total | |
| | |-------------|-------------|--------------|---------------|-----------------------|---------------------|------------------------| |
| | | A100 | 11.95 ms | 5.04 ms | 21.39 ms | 5468.17 ms | 72.25 ms | 5622.47 ms | |
| |
|
| | #### Timings for 30 steps at 1024x1024 with input image conditioning |
| |
|
| | | Accelerator | VAE Encoder | CLIP-G | CLIP-L | T5XXL | MMDiT | VAE Decoder | Total | |
| | |-------------|----------------|-------------|--------------|---------------|-----------------------|---------------------|----------------| |
| | | A100 | 37.04 ms | 12.07 ms | 5.07 ms | 21.49 ms | 3340.69 ms | 72.02 ms | 3531.49 ms | |
| |
|
| |
|
| | ## Int8 quantization with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) |
| | The MMDiT in Stable Diffusion 3 Medium can be further optimized with INT8 quantization using TensorRT Model Optimizer. The estimated end-to-end speedup comparing TensorRT fp16 and TensorRT int8 is 1.2x~1.4x on various NVidia GPUs. The memory saving is about 2x for the int8 MMDiT engine compared with the fp16 counterpart. The image quality can be maintained with minimal to negligible degradation. |
| |
|
| | ## Usage Example |
| | <!-- Finalize the branch and namespace --> |
| | 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd3/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
| | ```shell |
| | git clone https://github.com/NVIDIA/TensorRT.git |
| | cd TensorRT |
| | git checkout release/sd3 |
| | docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.05-py3 /bin/bash |
| | ``` |
| |
|
| | 2. Download the Stable Diffusion 3 Medium TensorRT files from this repo |
| | ```shell |
| | git lfs install |
| | git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt |
| | cd stable-diffusion-3-medium-tensorrt |
| | git lfs pull |
| | cd .. |
| | ``` |
| |
|
| | 3. Install libraries and requirements |
| | ```shell |
| | cd demo/Diffusion |
| | python3 -m pip install --upgrade pip |
| | pip3 install -r requirements.txt |
| | python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 |
| | ``` |
| |
|
| |
|
| | 4. Perform TensorRT optimized inference: |
| |
|
| | - **Stable Diffusion 3 Medium** |
| | |
| | Works best for 1024x1024 images. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. |
| | ``` |
| | python3 demo_txt2img_sd3.py \ |
| | "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \ |
| | --version=sd3 \ |
| | --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \ |
| | --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \ |
| | --seed 42 \ |
| | --width 1024 \ |
| | --height 1024 \ |
| | --build-static-batch \ |
| | --use-cuda-graph |
| | ``` |
| | |
| | - **Stable Diffusion 3 Medium with input image conditioning** |
| | |
| | Provide an input image conditioning using below. Works best for 1024x1024 but may also work at 512x512. |
| | ``` |
| | wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png -O dog-on-bench.png |
| | |
| | python3 demo_txt2img_sd3.py \ |
| | "dog wearing a sweater and a blue collar" \ |
| | --version=sd3 \ |
| | --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \ |
| | --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \ |
| | --seed 42 \ |
| | --width 1024 \ |
| | --height 1024 \ |
| | --input-image dog-on-bench.png \ |
| | --build-static-batch \ |
| | --use-cuda-graph |
| | ``` |
| | |