SmolLM2-360M on WebGPU

HuggingFace's tiny 360M parameter model running in browser WebGPU.

369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

Quick Start

Download Q8_0 GGUF from bartowski
Place in model_splits/ (no splitting needed — single file)
node serve.js (port 8180)
Open http://localhost:8180 in Chrome

Use Cases

Lightweight chat and Q&A
Classification and summarization
Edge/IoT inference
Testing and prototyping

Hardware

Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB — it fits anywhere.

Why This Package

Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

Credits

Built by Joshua (LJTSG) and Claude.

Co-Authored-By: Claude noreply@anthropic.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for LJTSG/SmolLM2-360M-webgpu

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

(147)

this model