Papers
arxiv:2602.09082

UI-Venus-1.5 Technical Report

Published on Feb 9
· Submitted by
Zhangxuan Gu
on Feb 11
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

UI-Venus-1.5 is a unified GUI agent with improved performance through mid-training stages, online reinforcement learning, and model merging techniques.

AI-generated summary

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

Community

Paper author Paper submitter
edited about 23 hours ago

Is your GUI Agent ready for real work? 🔥

We’ve seen many great previous GUI Agents, but making a "stable assistant" for phones and websites is still hard. There are three main problems:

1️⃣ Knowledge Gap: AI often misses less common icons and doesn't know how specialized apps work.
2️⃣ The Reality Gap: Models that work well in tests often fail during real-life tasks.
3️⃣ Too Complex: Using multi-agent framework usually costs too much.

Enter UI-Venus-1.5 🚀 — The new high-performance, end-to-end GUI Agent from Ant Group!

Unlike old ways, UI-Venus-1.5 is built for real-world use:
📱 All-in-One: One single model for Grounding, Mobile, and Web tasks.
🇨🇳 Real App Support: Full support for 40+ popular Chinese apps, making AI part of daily life.
⚡ Simple & Fast: A clean, end-to-end design for faster and more reliable work.

Check it out and see how AI can truly help you! 🐜✨

Paper author Paper submitter
edited about 24 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.09082 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.09082 in a Space README.md to link it from this page.

Collections including this paper 2