Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 34
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model Paper • 2604.02097 • Published Apr 2 • 32
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model Paper • 2604.02097 • Published Apr 2 • 32
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Paper • 2601.10332 • Published Jan 15 • 32