Running Agents AVGen-Bench Leaderboard 🚀 Explore and submit AVGen-Bench model scores on the leaderboard
A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets Paper • 2606.13802 • Published 19 days ago • 1
SWE-FastContext Collection A family of code-search models powering the Explore subagent for coding agents.(It will be made public later) • 3 items • Updated about 5 hours ago • 15
FastContext: Training Efficient Repository Explorer for Coding Agents Paper • 2606.14066 • Published 18 days ago • 93
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 22 days ago • 104
SWE-FastContext Collection A family of code-search models powering the Explore subagent for coding agents.(It will be made public later) • 3 items • Updated about 5 hours ago • 15
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 22 days ago • 104
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents Paper • 2606.05597 • Published 26 days ago • 4