Haoyu Guo's picture

69 3

Haoyu Guo

ghy0324

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

upvoted a paper 17 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

upvoted a paper 17 days ago

Step-GUI Technical Report

View all activity

Organizations

upvoted a paper 11 days ago

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published 11 days ago • 42

upvoted 2 papers 17 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 17 days ago • 58

Step-GUI Technical Report

Paper • 2512.15431 • Published 17 days ago • 125

upvoted a paper 19 days ago

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 25 days ago • 115

upvoted a paper 26 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published 29 days ago • 38

upvoted a paper 30 days ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 65

upvoted 7 papers about 1 month ago

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Paper • 2512.03746 • Published Dec 3, 2025 • 16

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 32

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 148

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 244

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 92

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 125

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 110

upvoted 6 papers about 2 months ago

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 96

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 201

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 37

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published Nov 7, 2025 • 42

upvoted a paper 2 months ago

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 82