From Growing to Looping: A Unified View of Iterative Computation in LLMs Paper ⢠2602.16490 ⢠Published Feb 18 ⢠1
EMO: Pretraining Mixture of Experts for Emergent Modularity Paper ⢠2605.06663 ⢠Published 2 days ago ⢠2
view article Article EMO: Pretraining mixture of experts for emergent modularity about 8 hours ago ⢠14
PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models Paper ⢠2601.19917 ⢠Published Jan 7 ⢠2
Scaling Laws of Synthetic Data for Language Models Paper ⢠2503.19551 ⢠Published Mar 25, 2025 ⢠2
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment Paper ⢠2601.14249 ⢠Published Jan 20 ⢠14
Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information Paper ⢠2604.15701 ⢠Published 22 days ago ⢠1
Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection Paper ⢠2604.02819 ⢠Published Apr 3 ⢠1
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster Paper ⢠2505.18642 ⢠Published May 24, 2025 ⢠1
Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning Paper ⢠2505.22203 ⢠Published May 28, 2025 ⢠7
Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning Paper ⢠2601.17223 ⢠Published Jan 23 ⢠1
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier Paper ⢠2505.11966 ⢠Published May 17, 2025 ⢠6
Reinforcing General Reasoning without Verifiers Paper ⢠2505.21493 ⢠Published May 27, 2025 ⢠27
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper ⢠2505.03335 ⢠Published May 6, 2025 ⢠192
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper ⢠2504.20571 ⢠Published Apr 29, 2025 ⢠99