GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities Paper • 2507.12367 • Published Jul 16, 2025 • 7
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 10
Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics Paper • 2603.01209 • Published Mar 1
FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration Paper • 2510.04852 • Published Oct 13, 2025
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42