Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published 11 days ago • 17
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 11 days ago • 54
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 28
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 53
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 119
KORMo pretraining datasets Collection The pretraining datasets for KORMo-10B were collected from diverse, publicly available source. • 14 items • Updated Oct 13, 2025 • 20
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 26
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 90
AI2 Safety Toolkit Collection Safety data, moderation tools and safe LLMs. • 6 items • Updated 11 days ago • 8
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions Paper • 2506.00421 • Published May 31, 2025 • 5
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 139