When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors Paper • 2606.32029 • Published 2 days ago • 3
Autonomous Scientific Discovery via Iterative Meta-Reflection Paper • 2607.01131 • Published 1 day ago • 3
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving Paper • 2607.00466 • Published 1 day ago • 15
Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning Paper • 2606.29985 • Published 3 days ago • 15
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published May 29 • 123
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets Paper • 2512.10971 • Published Dec 1, 2025 • 12
MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training Paper • 2606.30406 • Published 3 days ago • 6
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments Paper • 2601.07853 • Published Jan 9 • 11
Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Paper • 2605.24737 • Published May 23 • 2
Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents Paper • 2601.15322 • Published Mar 7 • 1
TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories Paper • 2604.07223 • Published Apr 8 • 1
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents Paper • 2604.10577 • Published Apr 12 • 27
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks Paper • 2606.29082 • Published 5 days ago • 26
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents Paper • 2606.32034 • Published 2 days ago • 9
DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation Paper • 2606.31537 • Published 2 days ago • 17
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction Paper • 2606.28186 • Published 6 days ago • 6