RL Compositionality
Collection
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones. https://huggingface.co/papers/2509.25123 • 5 items • Updated
• 1
The model after Stage 1 RFT.
Base model
meta-llama/Llama-3.1-8B