Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Paper • 2303.05499 • Published Mar 9, 2023 • 8
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models Paper • 2606.25041 • Published 4 days ago • 83
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 4 days ago • 128
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 10 days ago • 63
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Paper • 2606.19195 • Published 10 days ago • 137
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 11 days ago • 207
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 17 days ago • 204
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 12 days ago • 119
OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data Paper • 2606.13432 • Published 16 days ago • 111
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 17 days ago • 89
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution Paper • 2606.06492 • Published 23 days ago • 94