Instructions to use embedl/sam3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use embedl/sam3 with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: embedl-models-community-licence-1.0 | |
| license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE | |
| base_model: | |
| - facebook/sam3 | |
| quantized_from: | |
| - facebook/sam3 | |
| tags: | |
| - segmentation | |
| - sam | |
| - sam3 | |
| - quantization | |
| - onnx | |
| - tensorrt | |
| - edge | |
| - embedl | |
| gated: true | |
| extra_gated_heading: "Access Embedl SAM3 (Quantized)" | |
| extra_gated_description: >- | |
| To access this model, please review and accept the terms below. | |
| Your contact information is collected solely to manage access and, | |
| with your explicit consent, to notify you about updated or new | |
| optimized models from Embedl. You can withdraw consent at any time | |
| by contacting us (see Contact section below). See our license for full terms. | |
| extra_gated_button_content: "Agree and request access" | |
| extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream SAM License" | |
| extra_gated_fields: | |
| Company: text | |
| I agree to the Embedl Models Community Licence and upstream SAM License: checkbox | |
| I consent to being contacted by Embedl about products and services (optional): checkbox | |
| # Embedl SAM3 (Quantized) | |
| Deployable version of [facebook/sam3](https://huggingface.co/facebook/sam3). | |
| Mixed-precision INT8/FP16 quantization with hardware-aware optimizations. | |
| <table style="width: 100%; border-collapse: collapse; border: none;"> | |
| <tr style="border: none;"> | |
| <td style="width: 100%; border: none; padding: 10px;"> | |
| <p align="center"><b>Nvidia AGX Orin</b></p> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_orin.svg" style="width: 100%;"> | |
| </td> | |
| </tr> | |
| <tr style="border: none;"> | |
| <td style="width: 100%; border: none; padding: 10px;"> | |
| <p align="center"><b>Nvidia Jetson Thor</b></p> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_thor.svg" style="width: 100%;"> | |
| </td> | |
| </tr> | |
| <tr style="border: none;"> | |
| <td style="width: 100%; border: none; padding: 10px;"> | |
| <p align="center"><b>Nvidia L4</b></p> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__l4.svg" style="width: 100%;"> | |
| </td> | |
| </tr> | |
| <tr style="border: none;"> | |
| <td style="width: 100%; border: none; padding: 10px;"> | |
| <p align="center"><b>AMD MI300X</b></p> | |
| <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__mi300x.svg" style="width: 100%;"> | |
| </td> | |
| </tr> | |
| </table> | |
| <a href="https://hfviewer.com/facebook/sam3?utm_source=huggingface&utm_medium=embedded_model_card&utm_campaign=facebook__sam3_card" target="_blank" rel="noopener"> | |
| <img | |
| src="https://hfviewer.com/api/card.svg?source=facebook%2Fsam3&v=20260501clipcard" | |
| alt="Open facebook/sam3 in hfviewer" | |
| width="100%" | |
| /> | |
| </a> | |
| ## Highlights | |
| - **Format:** ONNX with external weights (`embedl_sam3_quant.onnx` + `.onnx.data`) | |
| - **Precision:** INT8 with sensitive layers kept in FP16 | |
| - **Runtime:** TensorRT (FP16 + INT8 mode) | |
| - **Hardware:** NVIDIA Jetson AGX Orin, Thor, desktop/server GPUs with TensorRT and AMD GPUs | |
| ## Quick Start | |
| ### 1. Download the model | |
| ```bash | |
| hf download embedl/sam3 embedl_sam3_quant.onnx embedl_sam3_quant.onnx.data infer_trt.py --local-dir . | |
| ``` | |
| ### 2. Build the TensorRT engine | |
| > **WARNING: Validated with TensorRT 10.1 and 10.3 only.** Latest versions of TensorRT produce incorrect segmentation masks for this model. | |
| ```bash | |
| /usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3_quant.onnx \ | |
| --fp16 --int8 \ | |
| --builderOptimizationLevel=5 \ | |
| --memPoolSize=workspace:4294967296 \ | |
| --timingCacheFile=embedl_sam3_timing_cache.bin \ | |
| --saveEngine=embedl_sam3_quant.engine | |
| ``` | |
| ### 3. Run inference | |
| See [`infer_trt.py`](infer_trt.py) for a complete example that runs | |
| text-prompted video segmentation, measures latency, and saves an output video | |
| with mask overlays. | |
| ```bash | |
| python3 -m venv venv --system-site-packages # Use system TensorRT | |
| source venv/bin/activate | |
| pip install opencv-python transformers av | |
| python infer_trt.py | |
| ``` | |
| ## Files | |
| | File | Description | | |
| |---|---| | |
| | `embedl_sam3_quant.onnx` | Quantized ONNX model with QDQ operations precalibrated | | |
| | `embedl_sam3_quant.onnx.data` | External weights (~3.1 GB) | | |
| | `infer_trt.py` | TensorRT inference example | | |
| ## Performance | |
| The input resolution is reduced from the default to 924 to enable TensorRT layer fusions that are not possible at the original size. All benchmarks use this | |
| resolution. | |
| ### NVIDIA L4 GPU | |
| > **Environment:** NVIDIA L4, Driver 570.211.01, CUDA 12.8, TensorRT 10.3 | |
|  | |
| | Configuration | Latency | Speedup | | |
| |---|---|---| | |
| | `torch.compile` (FP16) | 137 ms | 1.0x | | |
| | **Embedl Deploy (this model)** | **104 ms** | **1.32x** | | |
| ### NVIDIA Jetson AGX Orin | |
| | Configuration | Latency | Throughput | Speedup | | |
| |---|---|---|---| | |
| | Baseline (FP16, resized to 924) | 763 ms | 1.31 qps | 1.0x | | |
| | **Embedl Deploy (this model)** | **462 ms** | **2.17 qps** | **1.65x** | | |
| ### Accuracy (SA-Co/Gold) | |
| Evaluated on the SA-Co/Gold instance segmentation benchmark ([Table 30 in the SAM3 paper](https://arxiv.org/pdf/2511.16719)). The quantized model retains nearly all of the FP32 accuracy with a tolerance. | |
| **Average across all subsets:** | |
| | Model | cgF1 | IL_MCC | pos_µF1 | | |
| |---|---|---|---| | |
| | SAM3 (paper, Table 30) | 54.1 | 0.82 | 66.1 | | |
| | SAM3 ONNX FP32 (ours) | 55.56 | 0.823 | 67.45 | | |
| | **Embedl SAM3 INT8 (this model)** | **53.77** | **0.809** | **66.36** | | |
| **Per-subset breakdown:** | |
| | Subset | cgF1 (FP32) | cgF1 (INT8) | pos_µF1 (FP32) | pos_µF1 (INT8) | | |
| |---|---|---|---|---| | |
| | Metaclip | 47.92 | 47.07 | 59.24 | 58.54 | | |
| | SA-1B | 53.44 | 52.33 | 61.70 | 61.31 | | |
| | Crowded | 60.28 | 59.09 | 67.54 | 67.25 | | |
| | FG Food | 58.76 | 56.28 | 72.01 | 70.02 | | |
| | Sports Equipment | 67.85 | 65.61 | 75.15 | 73.91 | | |
| | Attributes | 55.11 | 54.12 | 73.08 | 72.57 | | |
| | WikiCommon | 45.57 | 41.85 | 63.46 | 60.88 | | |
| | **Average** | **55.56** | **53.77** | **67.45** | **66.36** | | |
| ## Creating Your Own Optimized Models | |
| Deployment-ready models can be created from any supported base model using [embedl-deploy](https://deploy.embedl.com), available on PyPI. Detailed | |
| tutorials will follow. | |
| ## License | |
| This model is a derivative of **facebook/sam3**. | |
| | Component | License | | |
| |---|---| | |
| | **Upstream (Meta SAM3)** | [SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE) | | |
| | **Optimized components** | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) *(no redistribution as a hosted service)* | | |
| ## Contact | |
| - **Enterprise & commercial inquiries:** [models@embedl.com](mailto:models@embedl.com) | |
| - **Technical issues & early access:** [github.com/embedl/embedl-deploy](https://github.com/embedl/embedl-deploy/) | |
| We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities. | |