My AI - a malkesh2911 Collection

malkesh2911 's Collections

My AI

updated 23 days ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 118
Kwaipilot/KAT-Dev-72B-Exp

Text Generation • 73B • Updated Oct 13, 2025 • 33 • 157
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 108
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated Dec 17, 2025 • 16.4k • 1.43k
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Paper • 2603.20278 • Published about 1 month ago • 94