Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 10 days ago • 17
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 4 days ago • 45
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 9 days ago • 19
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 11 days ago • 64
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published 15 days ago • 12
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties Paper • 2512.11799 • Published 15 days ago • 29
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 18 days ago • 111
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published 22 days ago • 28
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 23 days ago • 23
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Paper • 2512.05106 • Published 23 days ago • 15
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 23 days ago • 168
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published about 1 month ago • 45
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19 • 92
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19 • 74