Geyang
geyang627
AI & ML interests
None yet
Recent Activity
upvoted an article 3 days ago
Deriving the PPO Loss from First Principles upvoted an article 3 days ago
A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond new activity
3 months ago
QCRI/MultiNativQA:All is_reliable is True