-
Qwen-3-VL-8B OCR Receipts
🚀1structured data parser from receipt images
-
Qwen3 Omni Demo
⚡222Generate audio responses from text and media inputs
-
VLM Object Understanding
🦀111Explore object detection, visual grounding, keypoint Detecti
-
Dataset Card Drafter
😻1Create dataset descriptions and open PRs automatically
Melvin Vivas PRO
melvindave
AI & ML interests
Small Language Models, Vision, TTS, STT, Image Gen
Recent Activity
upvoted
a
collection
6 days ago
Resources for Sound Processing
updated
a collection
6 days ago
Text to Speech
liked
a model
6 days ago
ResembleAI/chatterbox-turbo
Organizations
Datasets
Evaluation
Text to Speech
-
Running on ZeroFeatured154
VibeVoice-Realtime-0.5B
🐨154Generate natural-sounding speech from text
-
microsoft/VibeVoice-1.5B
Text-to-Speech • 3B • Updated • 672k • 2.12k -
RunningFeatured302
Qwen3 TTS Demo
🚀302Generate Speech from Text
-
mradermacher/Qwen3-1.7B-Multilingual-TTS-GGUF
2B • Updated • 539
Image Generation
Coding
Customer Conversations Datasets
Vision
-
Running on CPU Upgrade953
Open VLM Leaderboard
🌎953VLMEvalKit Evaluation Results Collection
-
Running on ZeroFeatured307
DeepSeek OCR Demo
🚀307Try out DeepSeek-OCR on your PDFs or images
-
Running on ZeroMCP53
Multimodal OCR3
🌖53nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
-
Qwen/Qwen3-VL-30B-A3B-Instruct
Image-Text-to-Text • 31B • Updated • 1.67M • • 467
Papers
Language Models (Reasoning)
Audio Transcription
Fine-tuning Models
OCR Datasets
Notable Spaces
-
Running1
Qwen-3-VL-8B OCR Receipts
🚀1structured data parser from receipt images
-
RunningFeatured222
Qwen3 Omni Demo
⚡222Generate audio responses from text and media inputs
-
Running on ZeroFeatured111
VLM Object Understanding
🦀111Explore object detection, visual grounding, keypoint Detecti
-
Sleeping1
Dataset Card Drafter
😻1Create dataset descriptions and open PRs automatically
Vision
-
Running on CPU Upgrade953
Open VLM Leaderboard
🌎953VLMEvalKit Evaluation Results Collection
-
Running on ZeroFeatured307
DeepSeek OCR Demo
🚀307Try out DeepSeek-OCR on your PDFs or images
-
Running on ZeroMCP53
Multimodal OCR3
🌖53nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
-
Qwen/Qwen3-VL-30B-A3B-Instruct
Image-Text-to-Text • 31B • Updated • 1.67M • • 467
Datasets
Papers
Evaluation
Language Models (Reasoning)
Text to Speech
-
Running on ZeroFeatured154
VibeVoice-Realtime-0.5B
🐨154Generate natural-sounding speech from text
-
microsoft/VibeVoice-1.5B
Text-to-Speech • 3B • Updated • 672k • 2.12k -
RunningFeatured302
Qwen3 TTS Demo
🚀302Generate Speech from Text
-
mradermacher/Qwen3-1.7B-Multilingual-TTS-GGUF
2B • Updated • 539
Audio Transcription
Image Generation
Fine-tuning Models
Coding
OCR Datasets
Customer Conversations Datasets