inference-optimization/granite-4.0-h-tiny-FP8-block
Text Generation
•
7B
•
Updated
•
109
inference-optimization/granite-4.0-h-tiny-quantized.w8a8
7B
•
Updated
•
55
inference-optimization/granite-4.0-h-tiny-NVFP4
inference-optimization/granite-4.0-h-tiny-quantized.w4a16
inference-optimization/Qwen3-30B-A3B-Instruct-2507.w8a8
31B
•
Updated
•
27
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w8a8
31B
•
Updated
•
28
inference-optimization/Qwen3-4B-Thinking-2507.w8a8
4B
•
Updated
•
39
inference-optimization/Qwen3-4B-Instruct-2507.w8a8
4B
•
Updated
•
31
inference-optimization/granite-4.0-h-small-quantized.w8a8
Updated
inference-optimization/granite-4.0-h-small-NVFP4
Updated
inference-optimization/granite-4.0-h-small-quantized.w4a16
Updated
inference-optimization/granite-4.0-h-small-FP8-dynamic
Updated
inference-optimization/granite-4.0-h-small-FP8-block
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B
•
Updated
•
174
inference-optimization/GLM-4.6-quantized.w4a16
48B
•
Updated
•
75
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation
•
32B
•
Updated
•
492
inference-optimization/Qwen3-Next-80B-A3B-Thinking-FP8
Text Generation
•
81B
•
Updated
•
7
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8
Text Generation
•
81B
•
Updated
•
12
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B
•
Updated
•
149
inference-optimization/Qwen3-Next-80B-A3B-Thinking-quantized.w8a8
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-block
Updated
inference-optimization/GLM-4.6-quantized.w8a8
353B
•
Updated
•
18
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w4a16
Text Generation
•
5B
•
Updated
•
3
inference-optimization/Qwen3-4B-Instruct-2507.w4a16
Text Generation
•
1B
•
Updated
•
3
inference-optimization/Qwen3-4B-Thinking-2507.w4a16
Text Generation
•
1B
•
Updated
•
7
inference-optimization/GLM-4.6-FP8-dynamic
353B
•
Updated
•
17
inference-optimization/GLM-4.6-NVFP4
199B
•
Updated
•
48
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B
•
Updated
•
1
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B
•
Updated
•
3