Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing Paper • 2604.22782 • Published Apr 3 • 8
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 7 days ago • 206
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 17 days ago • 45
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published 23 days ago • 36
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 23 days ago • 25
Youssofal/MiniMax-M2.7-Abliterated-Heretic-GGUF Text Generation • 229B • Updated 23 days ago • 8.21k • 45
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper • 2604.14164 • Published Mar 23 • 35
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers 21 days ago • 69