Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 4 days ago • 120
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 3 days ago • 69
AgentCPT/Qwen3-4B_thinking_agent_sft_nemotron_tool_calling_v2_lr1e-5_epoch_1_ctx_16384_bs_256 4B • Updated 10 days ago • 12
AgentCPT/Qwen3-4B_thinking_agent_sft_nemotron_tool_calling_v2_lr1e-5_epoch_1_ctx_16384_bs_256 4B • Updated 10 days ago • 12
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 251
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 99