Xie

Zhihui

·

https://zhxie.site/

zhxieml

AI & ML interests

None yet

Recent Activity

upvoted a paper 15 days ago

Qwen-AgentWorld: Language World Models for General Agents

upvoted a paper 16 days ago

Self-Compacting Language Model Agents

liked a model 3 months ago

deepseek-ai/DeepSeek-V4-Pro

View all activity

Organizations

upvoted a paper 15 days ago

Qwen-AgentWorld: Language World Models for General Agents

Paper • 2606.24597 • Published 17 days ago • 146

upvoted a paper 16 days ago

Self-Compacting Language Model Agents

Paper • 2606.23525 • Published 18 days ago • 18

upvoted 4 papers 3 months ago

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Paper • 2604.15093 • Published Apr 16 • 30

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Paper • 2604.10866 • Published Apr 13 • 69

Many-Tier Instruction Hierarchy in LLM Agents

Paper • 2604.09443 • Published Apr 10 • 16

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Paper • 2604.06132 • Published Apr 7 • 122

upvoted a paper 4 months ago

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published Mar 17 • 110

upvoted a collection 5 months ago

SGI-Bench

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows • 12 items • Updated May 6 • 35

upvoted 2 papers 8 months ago

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28, 2025 • 73

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 47

upvoted 3 papers 9 months ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Paper • 2510.08240 • Published Oct 9, 2025 • 41

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30

RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

Paper • 2510.02172 • Published Oct 2, 2025 • 7

upvoted a paper 10 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2, 2025 • 25

upvoted 3 papers about 1 year ago

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26, 2025 • 70

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 46

upvoted 3 papers over 1 year ago

MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published Apr 3, 2025 • 35

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 28

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11, 2025 • 50