SUSTech-NLP/UniRRM-SFT
Viewer • Updated • 35.7k • 95
Multilingual and multimodal LLM, data synthesis, complex reasoning with LLMs
Bridging the Agent-World Gap: Text World Models for LLM-based Agents
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks