DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18, 2025 • 144
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Paper • 2304.05977 • Published Apr 12, 2023 • 3
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Paper • 2407.13690 • Published Jun 18, 2024 • 2