LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 39
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16, 2025 • 33
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability Paper • 2508.07050 • Published Aug 9, 2025 • 117
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4, 2025 • 43
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Paper • 2505.22653 • Published May 28, 2025 • 43
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22, 2025 • 34
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published May 1, 2025 • 53
Effective and Efficient Masked Image Generation Models Paper • 2503.07197 • Published Mar 10, 2025 • 11
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 49