WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 5 days ago • 176
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published 6 days ago • 81
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 4 days ago • 31
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Paper • 2604.09201 • Published 4 days ago • 1
ELT: Elastic Looped Transformers for Visual Generation Paper • 2604.09168 • Published 4 days ago • 12
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published 4 days ago • 6
Running Featured 156 Gemma 4 WebGPU 🚀 156 Run Gemma 4 locally in-browser on WebGPU w/ Transformers.js
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 5 days ago • 267
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 6 days ago • 159
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 6 days ago • 66
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 6 days ago • 12
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Paper • 2604.08340 • Published 5 days ago • 5
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 5 days ago • 267
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 5 days ago • 37
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 5 days ago • 44
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 12 days ago • 35