Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models Paper • 2606.25473 • Published 4 days ago • 22
Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published about 1 month ago • 63
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published May 22 • 46
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published May 22 • 29
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 247
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published May 20 • 111
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published May 18 • 116
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published May 14 • 35
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published May 13 • 33
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision Paper • 2605.05781 • Published May 7 • 5
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published Apr 10 • 51
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published Jan 27 • 19
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published Jan 29 • 18