Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 4 days ago • 20
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Paper • 2606.12195 • Published 3 days ago • 20
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 4 days ago • 40
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 3 days ago • 102
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 3 days ago • 19
ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations Paper • 2606.11188 • Published 4 days ago • 24
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark Paper • 2606.06538 • Published 9 days ago • 3
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 9 days ago • 49
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration Paper • 2606.04743 • Published 10 days ago • 44
view article Article The Open Source Community is backing OpenEnv for Agentic RL +15 burtenshaw, spisakjo, lysandre, darktex, willcb, qjoy, pawalt, cwing-nv, danielhanchen, andrewzhou, shimmyshimmer, Hamid-Nazeri, Sanyam, zkwentz, emre0, lewtun, sergiopaniego • 5 days ago • 77
📝 Research & Long-Form Blog Posts Collection In-depth technical articles and research pieces published by Hugging Face • 18 items • Updated 15 days ago • 32
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills Paper • 2606.07412 • Published 8 days ago • 12
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents Paper • 2606.05806 • Published 9 days ago • 22