sam3 / README.md

Update README.md

e0f2269 verified 15 days ago

7.12 kB

	---
	license: other
	license_name: embedl-models-community-licence-1.0
	license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
	base_model:
	- facebook/sam3
	quantized_from:
	- facebook/sam3
	tags:
	- segmentation
	- sam
	- sam3
	- quantization
	- onnx
	- tensorrt
	- edge
	- embedl
	gated: true
	extra_gated_heading: "Access Embedl SAM3 (Quantized)"
	extra_gated_description: >-
	To access this model, please review and accept the terms below.
	Your contact information is collected solely to manage access and,
	with your explicit consent, to notify you about updated or new
	optimized models from Embedl. You can withdraw consent at any time
	by contacting us (see Contact section below). See our license for full terms.
	extra_gated_button_content: "Agree and request access"
	extra_gated_prompt: "By requesting access you agree to the Embedl Models Community Licence and the upstream SAM License"
	extra_gated_fields:
	Company: text
	I agree to the Embedl Models Community Licence and upstream SAM License: checkbox
	I consent to being contacted by Embedl about products and services (optional): checkbox
	---

	# Embedl SAM3 (Quantized)

	Deployable version of [facebook/sam3](https://huggingface.co/facebook/sam3).
	Mixed-precision INT8/FP16 quantization with hardware-aware optimizations.

	<table style="width: 100%; border-collapse: collapse; border: none;">
	<tr style="border: none;">
	<td style="width: 100%; border: none; padding: 10px;">
	<p align="center"><b>Nvidia AGX Orin</b></p>
	<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_orin.svg" style="width: 100%;">
	</td>
	</tr>
	<tr style="border: none;">
	<td style="width: 100%; border: none; padding: 10px;">
	<p align="center"><b>Nvidia Jetson Thor</b></p>
	<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__agx_thor.svg" style="width: 100%;">
	</td>
	</tr>
	<tr style="border: none;">
	<td style="width: 100%; border: none; padding: 10px;">
	<p align="center"><b>Nvidia L4</b></p>
	<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__l4.svg" style="width: 100%;">
	</td>
	</tr>
	<tr style="border: none;">
	<td style="width: 100%; border: none; padding: 10px;">
	<p align="center"><b>AMD MI300X</b></p>
	<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/SAM3__mi300x.svg" style="width: 100%;">
	</td>
	</tr>
	</table>

	<a href="https://hfviewer.com/facebook/sam3?utm_source=huggingface&utm_medium=embedded_model_card&utm_campaign=facebook__sam3_card" target="_blank" rel="noopener">
	<img
	src="https://hfviewer.com/api/card.svg?source=facebook%2Fsam3&v=20260501clipcard"
	alt="Open facebook/sam3 in hfviewer"
	width="100%"
	/>
	</a>

	## Highlights

	- Format: ONNX with external weights (`embedl_sam3_quant.onnx` + `.onnx.data`)
	- Precision: INT8 with sensitive layers kept in FP16
	- Runtime: TensorRT (FP16 + INT8 mode)
	- Hardware: NVIDIA Jetson AGX Orin, Thor, desktop/server GPUs with TensorRT and AMD GPUs

	## Quick Start

	### 1. Download the model

	```bash
	hf download embedl/sam3 embedl_sam3_quant.onnx embedl_sam3_quant.onnx.data infer_trt.py --local-dir .
	```

	### 2. Build the TensorRT engine

	> WARNING: Validated with TensorRT 10.1 and 10.3 only. Latest versions of TensorRT produce incorrect segmentation masks for this model.

	```bash
	/usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3_quant.onnx \
	--fp16 --int8 \
	--builderOptimizationLevel=5 \
	--memPoolSize=workspace:4294967296 \
	--timingCacheFile=embedl_sam3_timing_cache.bin \
	--saveEngine=embedl_sam3_quant.engine
	```

	### 3. Run inference

	See [`infer_trt.py`](infer_trt.py) for a complete example that runs
	text-prompted video segmentation, measures latency, and saves an output video
	with mask overlays.

	```bash
	python3 -m venv venv --system-site-packages # Use system TensorRT
	source venv/bin/activate
	pip install opencv-python transformers av
	python infer_trt.py
	```

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `embedl_sam3_quant.onnx` \| Quantized ONNX model with QDQ operations precalibrated \|
	\| `embedl_sam3_quant.onnx.data` \| External weights (~3.1 GB) \|
	\| `infer_trt.py` \| TensorRT inference example \|

	## Performance

	The input resolution is reduced from the default to 924 to enable TensorRT layer fusions that are not possible at the original size. All benchmarks use this
	resolution.

	### NVIDIA L4 GPU

	> Environment: NVIDIA L4, Driver 570.211.01, CUDA 12.8, TensorRT 10.3

	![Text-prompted video segmentation on NVIDIA L4 GPU](https://huggingface.co/datasets/embedl/documentation-images/resolve/main/SAM3/output_embedl_quant_with_masks.gif)

	\| Configuration \| Latency \| Speedup \|
	\|---\|---\|---\|
	\| `torch.compile` (FP16) \| 137 ms \| 1.0x \|
	\| Embedl Deploy (this model) \| 104 ms \| 1.32x \|

	### NVIDIA Jetson AGX Orin

	\| Configuration \| Latency \| Throughput \| Speedup \|
	\|---\|---\|---\|---\|
	\| Baseline (FP16, resized to 924) \| 763 ms \| 1.31 qps \| 1.0x \|
	\| Embedl Deploy (this model) \| 462 ms \| 2.17 qps \| 1.65x \|

	### Accuracy (SA-Co/Gold)

	Evaluated on the SA-Co/Gold instance segmentation benchmark ([Table 30 in the SAM3 paper](https://arxiv.org/pdf/2511.16719)). The quantized model retains nearly all of the FP32 accuracy with a tolerance.

	Average across all subsets:

	\| Model \| cgF1 \| IL_MCC \| pos_µF1 \|
	\|---\|---\|---\|---\|
	\| SAM3 (paper, Table 30) \| 54.1 \| 0.82 \| 66.1 \|
	\| SAM3 ONNX FP32 (ours) \| 55.56 \| 0.823 \| 67.45 \|
	\| Embedl SAM3 INT8 (this model) \| 53.77 \| 0.809 \| 66.36 \|

	Per-subset breakdown:

	\| Subset \| cgF1 (FP32) \| cgF1 (INT8) \| pos_µF1 (FP32) \| pos_µF1 (INT8) \|
	\|---\|---\|---\|---\|---\|
	\| Metaclip \| 47.92 \| 47.07 \| 59.24 \| 58.54 \|
	\| SA-1B \| 53.44 \| 52.33 \| 61.70 \| 61.31 \|
	\| Crowded \| 60.28 \| 59.09 \| 67.54 \| 67.25 \|
	\| FG Food \| 58.76 \| 56.28 \| 72.01 \| 70.02 \|
	\| Sports Equipment \| 67.85 \| 65.61 \| 75.15 \| 73.91 \|
	\| Attributes \| 55.11 \| 54.12 \| 73.08 \| 72.57 \|
	\| WikiCommon \| 45.57 \| 41.85 \| 63.46 \| 60.88 \|
	\| Average \| 55.56 \| 53.77 \| 67.45 \| 66.36 \|

	## Creating Your Own Optimized Models

	Deployment-ready models can be created from any supported base model using [embedl-deploy](https://deploy.embedl.com), available on PyPI. Detailed
	tutorials will follow.

	## License

	This model is a derivative of facebook/sam3.

	\| Component \| License \|
	\|---\|---\|
	\| Upstream (Meta SAM3) \| [SAM License](https://github.com/facebookresearch/sam3/blob/main/LICENSE) \|
	\| Optimized components \| [Embedl Models Community Licence v1.0](https://github.com/embedl/embedl-models/blob/main/LICENSE) (no redistribution as a hosted service) \|

	## Contact

	- Enterprise & commercial inquiries: [models@embedl.com](mailto:models@embedl.com)
	- Technical issues & early access: [github.com/embedl/embedl-deploy](https://github.com/embedl/embedl-deploy/)

	We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.