view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day sionic-ai • Dec 8, 2025 • 57
Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel Paper • 2508.18224 • Published Aug 25, 2025 • 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12, 2025 • 71
view article Article Vocabulary is the most important element of Sparse Retrieval yjoonjang • Oct 4, 2025 • 10
view article Article Training and Finetuning Reranker Models with Sentence Transformers tomaarsen • Mar 26, 2025 • 194
view article Article ChatML vs Harmony: Understanding the new Format from OpenAI 🔍 kuotient • Aug 9, 2025 • 57
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20, 2024 • 16
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 38
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 51
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated Mar 12 • 26
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf • Sep 18, 2024 • 280
view article Article dstack: Your LLM Launchpad - From Fine-Tuning to Serving, Simplified chansung • Aug 22, 2024 • 13
view article Article Training and Finetuning Embedding Models with Sentence Transformers tomaarsen • May 28, 2024 • 274
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 83
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18, 2024 • 34