Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 5 days ago • 38
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 250
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 26
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20, 2025 • 67
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Paper • 2510.11683 • Published Oct 13, 2025 • 14
LLaDA-8B-BGPO Collection Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models • 4 items • Updated Oct 11, 2025 • 4
DeepPrune Collection Parallel Scaling without Inter-trace Redundancy • 3 items • Updated Oct 10, 2025 • 2
DeepPrune: Parallel Scaling without Inter-trace Redundancy Paper • 2510.08483 • Published Oct 9, 2025 • 24
DeepPrune: Parallel Scaling without Inter-trace Redundancy Paper • 2510.08483 • Published Oct 9, 2025 • 24 • 2
DeepPrune Collection Parallel Scaling without Inter-trace Redundancy • 3 items • Updated Oct 10, 2025 • 2
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 53