Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 3 days ago • 122
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 7 days ago • 144
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 3 days ago • 122
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 7 days ago • 144
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 7 days ago • 68
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision Paper • 2601.19798 • Published 9 days ago • 41
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published about 1 month ago • 46
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published about 1 month ago • 46
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published about 1 month ago • 46
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation Paper • 2512.21252 • Published Dec 24, 2025 • 35
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 22
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation Paper • 2512.01801 • Published Dec 1, 2025 • 24
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published Nov 27, 2025 • 22
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published Nov 27, 2025 • 22
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 209
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21, 2025 • 67
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published Oct 11, 2025 • 36