PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 7 days ago • 70
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published 10 days ago • 70
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards Paper • 2512.00473 • Published 26 days ago • 25
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published 16 days ago • 18
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper • 2512.09247 • Published 16 days ago • 46
Composing Concepts from Images and Videos via Concept-prompt Binding Paper • 2512.09824 • Published 15 days ago • 27
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published 15 days ago • 70
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published 17 days ago • 47
Light-X: Generative 4D Video Rendering with Camera and Illumination Control Paper • 2512.05115 • Published 21 days ago • 10
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17 • 133
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published Nov 16 • 103
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning Paper • 2510.18940 • Published Oct 21 • 8
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents Paper • 2510.19949 • Published Oct 22 • 38
Emu3.5 Collection Native Multimodal Models are World Learners 🌍 • 4 items • Updated about 17 hours ago • 72
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper • 2510.20766 • Published Oct 23 • 34
UltraGen: High-Resolution Video Generation with Hierarchical Attention Paper • 2510.18775 • Published Oct 21 • 17
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Paper • 2510.13809 • Published Oct 15 • 37