-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2604.08995
-
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
Paper • 2603.23497 • Published • 91 -
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Paper • 2604.07209 • Published • 35 -
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Paper • 2604.08995 • Published • 46 -
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Paper • 2604.18564 • Published • 35
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 7 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 55
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
Paper • 2603.23497 • Published • 91 -
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Paper • 2604.07209 • Published • 35 -
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Paper • 2604.08995 • Published • 46 -
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Paper • 2604.18564 • Published • 35
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 7 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 55