SmolLM2-360M on WebGPU
HuggingFace's tiny 360M parameter model running in browser WebGPU.
369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.
Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).
Quick Start
- Download Q8_0 GGUF from bartowski
- Place in
model_splits/(no splitting needed โ single file) node serve.js(port 8180)- Open
http://localhost:8180in Chrome
Use Cases
- Lightweight chat and Q&A
- Classification and summarization
- Edge/IoT inference
- Testing and prototyping
Hardware
Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB โ it fits anywhere.
Why This Package
Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.
Credits
Built by Joshua (LJTSG) and Claude.
Co-Authored-By: Claude noreply@anthropic.com
Model tree for LJTSG/SmolLM2-360M-webgpu
Base model
HuggingFaceTB/SmolLM2-360M Quantized
HuggingFaceTB/SmolLM2-360M-Instruct