Qwen3.5-0.8B-f32-GGUF

Qwen3.5-0.8B from Alibaba's Qwen team is the smallest model in the Qwen3.5 small series (0.8B-9B), a 0.8B-parameter dense multimodal causal language model with vision encoder featuring a hybrid Gated DeltaNet architecture (3:1 linear attention to softmax blocks for constant O(1) memory at 262K native context, extensible to 1M+ tokens), multi-token prediction, and 248K vocabulary covering 201 languages. Designed for ultra-efficient edge deployment on phones, Raspberry Pi, or IoT devices (~1.6GB VRAM at BF16, ~0.5GB at 4-bit quantization), it natively handles text, images, and video with strong benchmarks including OCRBench 74.5, MathVista 62.2, VideoMME 63.8, RefCOCO 79.3—outperforming prior larger models in vision tasks while fitting embedded systems for practical OCR, document reading, screenshot analysis, and basic video understanding. Apache 2.0-licensed with base model available, seven quantized variants (including GGUF for llama.cpp CPU inference), and toggleable thinking mode for reasoning vs latency trade-offs, it democratizes multimodal AI for high-throughput, low-resource applications like mobile apps and offline automation.

Model Files

File Name Quant Type File Size File Link
Qwen3.5-0.8B.BF16.gguf BF16 1.52 GB Download
Qwen3.5-0.8B.F16.gguf F16 1.52 GB Download
Qwen3.5-0.8B.F32.gguf F32 3.02 GB Download
Qwen3.5-0.8B.Q8_0.gguf Q8_0 812 MB Download
Qwen3.5-0.8B.mmproj-bf16.gguf mmproj-bf16 207 MB Download
Qwen3.5-0.8B.mmproj-f16.gguf mmproj-f16 207 MB Download
Qwen3.5-0.8B.mmproj-f32.gguf mmproj-f32 402 MB Download
Qwen3.5-0.8B.mmproj-q8_0.gguf mmproj-q8_0 116 MB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
502
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Qwen3.5-0.8B-f32-GGUF

Quantized
(87)
this model

Collection including prithivMLmods/Qwen3.5-0.8B-f32-GGUF