CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25 • 19
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Paper • 2507.07451 • Published Jul 10 • 5
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval Paper • 2505.19650 • Published May 26 • 5
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Paper • 2502.18965 • Published Feb 26 • 28