Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
•
2510.00938
•
Published
•
58
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
•
2509.19284
•
Published
•
22
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper
•
2509.25810
•
Published
•
5
Agent Learning via Early Experience
Paper
•
2510.08558
•
Published
•
270
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
187
OpenThoughts: Data Recipes for Reasoning Models
Paper
•
2506.04178
•
Published
•
49
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper
•
2507.01352
•
Published
•
56
First Return, Entropy-Eliciting Explore
Paper
•
2507.07017
•
Published
•
23
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
50
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale
Thinking Model
Paper
•
2510.18855
•
Published
•
71
Revealing the Power of Post-Training for Small Language Models via
Knowledge Distillation
Paper
•
2509.26497
•
Published
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open
Language Foundation
Paper
•
2510.22115
•
Published
•
83
Balanced Actor Initialization: Stable RLHF Training of
Distillation-Based Reasoning Models
Paper
•
2509.00309
•
Published
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame
Long-tail Generation
Paper
•
2509.18521
•
Published
Bridging Offline and Online Reinforcement Learning for LLMs
Paper
•
2506.21495
•
Published
•
3
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Paper
•
2512.07783
•
Published
•
36
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
•
2512.01374
•
Published
•
95
Paper
•
2505.14674
•
Published
•
37
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper
•
2512.13607
•
Published
•
27