Amarjyoti
amar-bach
·
AI & ML interests
None yet
Organizations
RL-reasoning
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
Paper • 2602.20945 • Published • 7 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 53 -
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Paper • 2512.04359 • Published -
How Far Can Unsupervised RLVR Scale LLM Training?
Paper • 2603.08660 • Published • 59
VLA
Pretrain
Agents
RL-reasoning
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
Paper • 2602.20945 • Published • 7 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 53 -
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Paper • 2512.04359 • Published -
How Far Can Unsupervised RLVR Scale LLM Training?
Paper • 2603.08660 • Published • 59
VLM
VLA