InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 1 day ago • 28
view article Article Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge 3 days ago • 7
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 2 days ago • 21
WildActor: Unconstrained Identity-Preserving Video Generation Paper • 2603.00586 • Published 12 days ago • 31
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling Paper • 2603.04791 • Published 7 days ago • 16
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval Paper • 2603.04743 • Published 7 days ago • 46
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 14 days ago • 39
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 9 days ago • 86
Helios: Real Real-Time Long Video Generation Model Paper • 2603.04379 • Published 8 days ago • 157
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions Paper • 2603.03646 • Published 8 days ago • 8
DreamWorld: Unified World Modeling in Video Generation Paper • 2603.00466 • Published 12 days ago • 16
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 10 days ago • 57
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Paper • 2603.03205 • Published 9 days ago • 11
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 14 days ago • 87
LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper • 2602.08676 • Published about 1 month ago • 70