SmolLM2-360M on WebGPU

HuggingFace's tiny 360M parameter model running in browser WebGPU.

369 MB Q8_0 quantization. Loads in under 2 seconds. Generates instantly.

Built and tested on AMD Strix Halo (Radeon 8060S iGPU, 64GB unified memory).

Quick Start

  1. Download Q8_0 GGUF from bartowski
  2. Place in model_splits/ (no splitting needed โ€” single file)
  3. node serve.js (port 8180)
  4. Open http://localhost:8180 in Chrome

Use Cases

  • Lightweight chat and Q&A
  • Classification and summarization
  • Edge/IoT inference
  • Testing and prototyping

Hardware

Any WebGPU-capable device. Tested on AMD Strix Halo but works on much smaller hardware too. The model is only 369 MB โ€” it fits anywhere.

Why This Package

Part of a series making popular models available on WebGPU for AMD unified memory AI PCs. WebGPU bypasses broken ROCm and routes through the gaming driver stack.

Credits

Built by Joshua (LJTSG) and Claude.

Co-Authored-By: Claude noreply@anthropic.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LJTSG/SmolLM2-360M-webgpu

Finetuned
(147)
this model