| | --- |
| | pipeline_tag: text-to-image |
| | library_name: tensorrt |
| | inference: false |
| | license: other |
| | license_name: stabilityai-ai-community |
| | license_link: LICENSE.md |
| | tags: |
| | - tensorrt |
| | - sd3.5-large |
| | - text-to-image |
| | - depth |
| | - canny |
| | - blur |
| | - controlnet |
| | - onnx |
| | - fp8 |
| | extra_gated_prompt: >- |
| | By clicking "Agree", you agree to the [License |
| | Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) |
| | and acknowledge Stability AI's [Privacy |
| | Policy](https://stability.ai/privacy-policy). |
| | extra_gated_fields: |
| | Name: text |
| | Email: text |
| | Country: country |
| | Organization or Affiliation: text |
| | Receive email updates and promotions on Stability AI products, services, and research?: |
| | type: select |
| | options: |
| | - 'Yes' |
| | - 'No' |
| | What do you intend to use the model for?: |
| | type: select |
| | options: |
| | - Research |
| | - Personal use |
| | - Creative Professional |
| | - Startup |
| | - Enterprise |
| | I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox |
| | language: |
| | - en |
| | --- |
| | |
| | # Stable Diffusion 3.5 Large ControlNet TensorRT |
| | ## Introduction |
| |
|
| | This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large ControlNets**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. |
| |
|
| | Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. |
| |
|
| | The following control types are available: |
| |
|
| | - Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles. |
| |
|
| | - Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets. |
| |
|
| | - Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision. The FP8 quantized models are also available for the Depth and Canny Controlnets. |
| |
|
| |
|
| | ## Performance using TensorRT 10.13 |
| | #### Depth ControlNet: Timings for 40 steps at 1024x1024 |
| |
|
| |
|
| | | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 40 | VAE Decoder | Total | |
| | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
| | | H100 | BF16 | 74.97 ms | 11.87 ms | 4.90 ms | 8.82 ms | 18839.01 ms | 117.38 ms | 19097.19 ms | |
| | | H100 | FP8 | 31.24 ms | 11.99 ms | 4.96 ms | 8.39 ms | 9175.53 ms | 36.36 ms | 9308.86 ms | |
| |
|
| | #### Canny ControlNet: Timings for 60 steps at 1024x1024 |
| |
|
| |
|
| | | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |
| | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
| | | H100 | BF16 | 78.50 ms | 12.29 ms | 5.08 ms | 8.65 ms | 28057.08 ms | 106.49 ms | 28306.20 ms | |
| | | H100 | FP8 | 31.21 ms | 12.17 ms | 4.96 ms | 8.35 ms | 13936.82 ms | 36.63 ms | 14068.32 ms | |
| |
|
| |
|
| | #### Blur ControlNet: Timings for 60 steps at 1024x1024 |
| |
|
| | | Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |
| | |-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
| | | H100 | BF16 | 74.48 ms | 11.71 ms | 4.86 ms | 8.80 ms | 28604.26 ms | 113.24 ms | 28859.06 ms | |
| |
|
| |
|
| |
|
| | ## Usage Example |
| | 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
| | ```shell |
| | git clone https://github.com/NVIDIA/TensorRT.git |
| | cd TensorRT |
| | git checkout release/sd35 |
| | docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.08-py3 /bin/bash |
| | ``` |
| |
|
| |
|
| | 2. Install libraries and requirements |
| | ```shell |
| | cd demo/Diffusion |
| | source setup.sh |
| | ``` |
| |
|
| | 3. Generate HuggingFace user access token |
| | To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages. |
| | You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). |
| |
|
| | ```bash |
| | export HF_TOKEN=<your access token> |
| | ``` |
| |
|
| | 4. Perform TensorRT optimized inference: |
| |
|
| | - **Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision** |
| | |
| | ``` |
| | python3 demo_controlnet_sd35.py \ |
| | "a photo of a man" \ |
| | --version=3.5-large \ |
| | --bf16 \ |
| | --controlnet-type depth \ |
| | --download-onnx-models \ |
| | --denoising-steps=40 \ |
| | --guidance-scale 4.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |
| | - **Stable Diffusion 3.5 Large Depth ControlNet in FP8 precision** |
| | |
| | ``` |
| | python3 demo_controlnet_sd35.py \ |
| | "a photo of a man" \ |
| | --version=3.5-large \ |
| | --fp8 \ |
| | --controlnet-type depth \ |
| | --download-onnx-models \ |
| | --denoising-steps=40 \ |
| | --guidance-scale 4.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |
| | - **Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision** |
| | |
| | ``` |
| | python3 demo_controlnet_sd35.py \ |
| | "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \ |
| | --version=3.5-large \ |
| | --bf16 \ |
| | --controlnet-type canny \ |
| | --download-onnx-models \ |
| | --denoising-steps=60 \ |
| | --guidance-scale 3.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |
| | - **Stable Diffusion 3.5 Large Canny ControlNet in FP8 precision** |
| | |
| | ``` |
| | python3 demo_controlnet_sd35.py \ |
| | "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \ |
| | --version=3.5-large \ |
| | --fp8 \ |
| | --controlnet-type canny \ |
| | --download-onnx-models \ |
| | --denoising-steps=60 \ |
| | --guidance-scale 3.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |
| | - **Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision** |
| | |
| | ``` |
| | python3 demo_controlnet_sd35.py \ |
| | "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \ |
| | --version=3.5-large \ |
| | --bf16 \ |
| | --controlnet-type blur \ |
| | --download-onnx-models \ |
| | --denoising-steps=60 \ |
| | --guidance-scale 3.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |