Qwen3.5-0.8B-f32-GGUF
Qwen3.5-0.8B from Alibaba's Qwen team is the smallest model in the Qwen3.5 small series (0.8B-9B), a 0.8B-parameter dense multimodal causal language model with vision encoder featuring a hybrid Gated DeltaNet architecture (3:1 linear attention to softmax blocks for constant O(1) memory at 262K native context, extensible to 1M+ tokens), multi-token prediction, and 248K vocabulary covering 201 languages. Designed for ultra-efficient edge deployment on phones, Raspberry Pi, or IoT devices (~1.6GB VRAM at BF16, ~0.5GB at 4-bit quantization), it natively handles text, images, and video with strong benchmarks including OCRBench 74.5, MathVista 62.2, VideoMME 63.8, RefCOCO 79.3—outperforming prior larger models in vision tasks while fitting embedded systems for practical OCR, document reading, screenshot analysis, and basic video understanding. Apache 2.0-licensed with base model available, seven quantized variants (including GGUF for llama.cpp CPU inference), and toggleable thinking mode for reasoning vs latency trade-offs, it democratizes multimodal AI for high-throughput, low-resource applications like mobile apps and offline automation.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Qwen3.5-0.8B.BF16.gguf | BF16 | 1.52 GB | Download |
| Qwen3.5-0.8B.F16.gguf | F16 | 1.52 GB | Download |
| Qwen3.5-0.8B.F32.gguf | F32 | 3.02 GB | Download |
| Qwen3.5-0.8B.Q8_0.gguf | Q8_0 | 812 MB | Download |
| Qwen3.5-0.8B.mmproj-bf16.gguf | mmproj-bf16 | 207 MB | Download |
| Qwen3.5-0.8B.mmproj-f16.gguf | mmproj-f16 | 207 MB | Download |
| Qwen3.5-0.8B.mmproj-f32.gguf | mmproj-f32 | 402 MB | Download |
| Qwen3.5-0.8B.mmproj-q8_0.gguf | mmproj-q8_0 | 116 MB | Download |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 502
8-bit
16-bit
32-bit
