-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2603.20278
-
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Paper • 2603.20278 • Published • 94 -
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
Paper • 2603.22847 • Published • 26 -
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
Paper • 2604.01007 • Published • 31
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 118 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 37 • 157 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 108 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 74 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Agentic AI Frameworks: Architectures, Protocols, and Design Challenges
Paper • 2508.10146 • Published -
Kimi K2.5: Visual Agentic Intelligence
Paper • 2602.02276 • Published • 264 -
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas
Paper • 2601.21558 • Published • 60
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.49M • • 4.71k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 293 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
-
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Paper • 2603.20278 • Published • 94 -
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
Paper • 2603.22847 • Published • 26 -
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
Paper • 2604.01007 • Published • 31
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Agentic AI Frameworks: Architectures, Protocols, and Design Challenges
Paper • 2508.10146 • Published -
Kimi K2.5: Visual Agentic Intelligence
Paper • 2602.02276 • Published • 264 -
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas
Paper • 2601.21558 • Published • 60
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 118 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 37 • 157 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 108 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19
-
openai/gpt-oss-120b
Text Generation • 120B • Updated • 3.49M • • 4.71k -
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper • 2512.20605 • Published • 62 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 19 -
TimeBill: Time-Budgeted Inference for Large Language Models
Paper • 2512.21859 • Published • 25
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 74 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 293 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88