-
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Benjamin-eecs/Llama-3.1-8B-Instruct-NLRL-TicTacToe-Value
Feature Extraction • 8B • Updated • 1 -
Benjamin-eecs/Llama-3.1-8B-Instruct-NLRL-TicTacToe-Policy
Feature Extraction • 8B • Updated • 5 -
Waterhorse/Llama-3.1-8B-Instruct-NLRL-Breakthrough-Value
Feature Extraction • 8B • Updated • 20
Benjamin
Benjamin-eecs
AI & ML interests
None yet
Recent Activity
authored a paper 29 days ago
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation upvoted a paper about 1 month ago
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation