-
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper • 2506.20512 • Published • 48 -
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Paper • 2509.15194 • Published • 33 -
Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Paper • 2505.22960 • Published • 16
Zeyu Qin
qqqzzzyyy
·
AI & ML interests
Scalable Oversight, AI safety
Organizations
None yet