Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 6 days ago • 12
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 7 days ago • 72
Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated 8 days ago • 14
Nemotron-Cascade Collection Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models • 17 items • Updated 1 day ago • 36
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 193
ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms Paper • 2502.06556 • Published Feb 10 • 3
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks Paper • 2507.12284 • Published Jul 16 • 7
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 12