ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 1 day ago • 153 • 391 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 2 days ago • 124 ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 262
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 262
ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 1 day ago • 153 • 391 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 2 days ago • 124 ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 262
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 262
pinned Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark