nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 Text Generation • 335B • Updated about 11 hours ago • 7.42k • • 111
Running 10 TurboQuant on Consumer GPUs — 100K Context on RTX 3090, 64K on RTX 4070 🚀 10 Extend LLM context to 100K tokens on consumer GPUs