Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 9 days ago • 35
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models Paper • 2601.10387 • Published 8 days ago • 10
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution Paper • 2601.10657 • Published 7 days ago • 19
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Paper • 2601.10402 • Published 8 days ago • 36
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 14 days ago • 203
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published 27 days ago • 35
Agentic Rubrics as Contextual Verifiers for SWE Agents Paper • 2601.04171 • Published 15 days ago • 11
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 18 days ago • 42
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published about 1 month ago • 26
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture Paper • 2512.21675 • Published 29 days ago • 25
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios Paper • 2512.18470 • Published Dec 20, 2025 • 11
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published Dec 18, 2025 • 116
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments Paper • 2512.19432 • Published Dec 22, 2025 • 13
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published Dec 22, 2025 • 12
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 33
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision Paper • 2512.15489 • Published Dec 17, 2025 • 9