SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 11 days ago • 42
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 17 days ago • 58
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 25 days ago • 115
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 29 days ago • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 65
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published Dec 3, 2025 • 16
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 244
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 96
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 201
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 82