ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2, 2025
Reliable and Efficient Amortized Model-based Evaluation Paper • 2503.13335 • Published Mar 17, 2025