Qwen3.5-0.8B-f32-GGUF

Qwen3.5-0.8B from Alibaba's Qwen team is the smallest model in the Qwen3.5 small series (0.8B-9B), a 0.8B-parameter dense multimodal causal language model with vision encoder featuring a hybrid Gated DeltaNet architecture (3:1 linear attention to softmax blocks for constant O(1) memory at 262K native context, extensible to 1M+ tokens), multi-token prediction, and 248K vocabulary covering 201 languages. Designed for ultra-efficient edge deployment on phones, Raspberry Pi, or IoT devices (~1.6GB VRAM at BF16, ~0.5GB at 4-bit quantization), it natively handles text, images, and video with strong benchmarks including OCRBench 74.5, MathVista 62.2, VideoMME 63.8, RefCOCO 79.3—outperforming prior larger models in vision tasks while fitting embedded systems for practical OCR, document reading, screenshot analysis, and basic video understanding. Apache 2.0-licensed with base model available, seven quantized variants (including GGUF for llama.cpp CPU inference), and toggleable thinking mode for reasoning vs latency trade-offs, it democratizes multimodal AI for high-throughput, low-resource applications like mobile apps and offline automation.

Model Files

File Name	Quant Type	File Size	File Link
Qwen3.5-0.8B.BF16.gguf	BF16	1.52 GB	Download
Qwen3.5-0.8B.F16.gguf	F16	1.52 GB	Download
Qwen3.5-0.8B.F32.gguf	F32	3.02 GB	Download
Qwen3.5-0.8B.Q8_0.gguf	Q8_0	812 MB	Download
Qwen3.5-0.8B.mmproj-bf16.gguf	mmproj-bf16	207 MB	Download
Qwen3.5-0.8B.mmproj-f16.gguf	mmproj-f16	207 MB	Download
Qwen3.5-0.8B.mmproj-f32.gguf	mmproj-f32	402 MB	Download
Qwen3.5-0.8B.mmproj-q8_0.gguf	mmproj-q8_0	116 MB	Download