FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 118
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO Paper • 2511.13288 • Published Nov 17, 2025 • 19
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published about 1 month ago • 94