TensorRT
ONNX
segmentation
sam
sam3
quantization
edge
embedl
sam3 / README.md
quantshah's picture
Update README.md
e0f2269 verified
---
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
base_model:
- facebook/sam3
quantized_from:
- facebook/sam3
tags:
- segmentation
- sam
- sam3
- quantization
- onnx
- tensorrt
- edge
- embedl
gated: true
extra_gated_heading: "Access Embedl SAM3 (Quantized)"
extra_gated_description: >-
To access this model, please review and accept the terms below.
Your contact information is collected solely to manage access and,
with your explicit consent, to notify you about updated or new
optimized models from Embedl. You can withdraw consent at any time
by contacting us (see Contact section below). See our license for full terms.
extra_gated_button_content: "Agree and request access"
extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream SAM License"
extra_gated_fields:
Company: text
I agree to the Embedl Models Community Licence and upstream SAM License: checkbox
I consent to being contacted by Embedl about products and services (optional): checkbox
---
# Embedl SAM3 (Quantized)
Deployable version of [facebook/sam3](https://huggingface.co/facebook/sam3).
Mixed-precision INT8/FP16 quantization with hardware-aware optimizations.
<table style="width: 100%; border-collapse: collapse; border: none;">
<tr style="border: none;">
<td style="width: 100%; border: none; padding: 10px;">
<p align="center"><b>Nvidia AGX Orin</b></p>
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_orin.svg" style="width: 100%;">
</td>
</tr>
<tr style="border: none;">
<td style="width: 100%; border: none; padding: 10px;">
<p align="center"><b>Nvidia Jetson Thor</b></p>
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_thor.svg" style="width: 100%;">
</td>
</tr>
<tr style="border: none;">
<td style="width: 100%; border: none; padding: 10px;">
<p align="center"><b>Nvidia L4</b></p>
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__l4.svg" style="width: 100%;">
</td>
</tr>
<tr style="border: none;">
<td style="width: 100%; border: none; padding: 10px;">
<p align="center"><b>AMD MI300X</b></p>
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__mi300x.svg" style="width: 100%;">
</td>
</tr>
</table>
<a href="https://hfviewer.com/facebook/sam3?utm_source=huggingface&amp;utm_medium=embedded_model_card&amp;utm_campaign=facebook__sam3_card" target="_blank" rel="noopener">
<img
src="https://hfviewer.com/api/card.svg?source=facebook%2Fsam3&amp;v=20260501clipcard"
alt="Open facebook/sam3 in hfviewer"
width="100%"
/>
</a>
## Highlights
- **Format:** ONNX with external weights (`embedl_sam3_quant.onnx` + `.onnx.data`)
- **Precision:** INT8 with sensitive layers kept in FP16
- **Runtime:** TensorRT (FP16 + INT8 mode)
- **Hardware:** NVIDIA Jetson AGX Orin, Thor, desktop/server GPUs with TensorRT and AMD GPUs
## Quick Start
### 1. Download the model
```bash
hf download embedl/sam3 embedl_sam3_quant.onnx embedl_sam3_quant.onnx.data infer_trt.py --local-dir .
```
### 2. Build the TensorRT engine
> **WARNING: Validated with TensorRT 10.1 and 10.3 only.** Latest versions of TensorRT produce incorrect segmentation masks for this model.
```bash
/usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3_quant.onnx \
--fp16 --int8 \
--builderOptimizationLevel=5 \
--memPoolSize=workspace:4294967296 \
--timingCacheFile=embedl_sam3_timing_cache.bin \
--saveEngine=embedl_sam3_quant.engine
```
### 3. Run inference
See [`infer_trt.py`](infer_trt.py) for a complete example that runs
text-prompted video segmentation, measures latency, and saves an output video
with mask overlays.
```bash
python3 -m venv venv --system-site-packages # Use system TensorRT
source venv/bin/activate
pip install opencv-python transformers av
python infer_trt.py
```
## Files
| File | Description |
|---|---|
| `embedl_sam3_quant.onnx` | Quantized ONNX model with QDQ operations precalibrated |
| `embedl_sam3_quant.onnx.data` | External weights (~3.1 GB) |
| `infer_trt.py` | TensorRT inference example |
## Performance
The input resolution is reduced from the default to 924 to enable TensorRT layer fusions that are not possible at the original size. All benchmarks use this
resolution.
### NVIDIA L4 GPU
> **Environment:** NVIDIA L4, Driver 570.211.01, CUDA 12.8, TensorRT 10.3
![Text-prompted video segmentation on NVIDIA L4 GPU](https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/output_embedl_quant_with_masks.gif)
| Configuration | Latency | Speedup |
|---|---|---|
| `torch.compile` (FP16) | 137 ms | 1.0x |
| **Embedl Deploy (this model)** | **104 ms** | **1.32x** |
### NVIDIA Jetson AGX Orin
| Configuration | Latency | Throughput | Speedup |
|---|---|---|---|
| Baseline (FP16, resized to 924) | 763 ms | 1.31 qps | 1.0x |
| **Embedl Deploy (this model)** | **462 ms** | **2.17 qps** | **1.65x** |
### Accuracy (SA-Co/Gold)
Evaluated on the SA-Co/Gold instance segmentation benchmark ([Table 30 in the SAM3 paper](https://arxiv.org/pdf/2511.16719)). The quantized model retains nearly all of the FP32 accuracy with a tolerance.
**Average across all subsets:**
| Model | cgF1 | IL_MCC | pos_µF1 |
|---|---|---|---|
| SAM3 (paper, Table 30) | 54.1 | 0.82 | 66.1 |
| SAM3 ONNX FP32 (ours) | 55.56 | 0.823 | 67.45 |
| **Embedl SAM3 INT8 (this model)** | **53.77** | **0.809** | **66.36** |
**Per-subset breakdown:**
| Subset | cgF1 (FP32) | cgF1 (INT8) | pos_µF1 (FP32) | pos_µF1 (INT8) |
|---|---|---|---|---|
| Metaclip | 47.92 | 47.07 | 59.24 | 58.54 |
| SA-1B | 53.44 | 52.33 | 61.70 | 61.31 |
| Crowded | 60.28 | 59.09 | 67.54 | 67.25 |
| FG Food | 58.76 | 56.28 | 72.01 | 70.02 |
| Sports Equipment | 67.85 | 65.61 | 75.15 | 73.91 |
| Attributes | 55.11 | 54.12 | 73.08 | 72.57 |
| WikiCommon | 45.57 | 41.85 | 63.46 | 60.88 |
| **Average** | **55.56** | **53.77** | **67.45** | **66.36** |
## Creating Your Own Optimized Models
Deployment-ready models can be created from any supported base model using [embedl-deploy](https://deploy.embedl.com), available on PyPI. Detailed
tutorials will follow.
## License
This model is a derivative of **facebook/sam3**.
| Component | License |
|---|---|
| **Upstream (Meta SAM3)** | [SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE) |
| **Optimized components** | [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) *(no redistribution as a hosted service)* |
## Contact
- **Enterprise & commercial inquiries:** [models@embedl.com](mailto:models@embedl.com)
- **Technical issues & early access:** [github.com/embedl/embedl-deploy](https://github.com/embedl/embedl-deploy/)
We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.