Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
83.7
TFLOPS
61
38
447
David Golchinfar
PRO
DavidGF
Follow
Dabirius's profile picture
zdmc23's profile picture
faceradix's profile picture
65 followers
·
47 following
https://vago-solutions.ai
DavidGFar
dgolchin
AI & ML interests
finetune llms, improve german language understanding and generated text of llms
Recent Activity
reacted
to
anakin87
's
post
with ❤️
about 1 hour ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
liked
a Space
about 1 hour ago
anakin87/LFM2-2.6B-mr-tictactoe
liked
a model
2 days ago
lightonai/DenseOn
View all activity
Organizations
DavidGF
's datasets
None public yet