docling-project/SmolDocling-256M-preview
Image-Text-to-Text • 0.3B • Updated • 30.3k • 1.61k
Conversational speech generation
SOTA real-time object detection model
Detect and segment objects in images using text, visual, or prompt-free prompts
Convert document images to structured text and data
Generate 3D video from input images
MultiImages-to-3D Generation