-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
Collections
Discover the best community collections!
Collections including paper arxiv:2603.25158
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 271 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Paper • 2602.12670 • Published • 59 -
SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration
Paper • 2603.21019 • Published
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper • 2604.08377 • Published • 276 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40 -
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Paper • 2604.04804 • Published • 32
-
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
Memento-Skills: Let Agents Design Agents
Paper • 2603.18743 • Published • 57 -
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
Paper • 2603.15401 • Published • 19 -
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Paper • 2602.12670 • Published • 59 -
SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration
Paper • 2603.21019 • Published
-
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50 -
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Paper • 2604.08377 • Published • 276 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 40 -
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Paper • 2604.04804 • Published • 32
-
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
Memento-Skills: Let Agents Design Agents
Paper • 2603.18743 • Published • 57 -
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
Paper • 2603.15401 • Published • 19 -
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 50
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 271 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 28
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34