Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
new activity 10 days ago
Keven16/Qwen3-4B-Non-Thinking-RL-Math-Step500:What is the data source used for training this model? authored a paper about 1 month ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe upvoted a paper about 1 month ago
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and RecipeOrganizations
None yet