Muhammad Ramzan
iamramzan
AI & ML interests
GenAI, Vision & Co
Organizations
Vision Foundation Models 🧩
Foundation models for computer vision.
-
Running110
Grounding DINO Demo
💻110Cutting edge open-vocabulary object detection app
-
RunningFeatured94
Owlv2
👀94State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured41
BLIP2 with transformers
🌖41BLIP2 (cutting edge image captioning) in 🤗transformers
-
Build errorFeatured377
IDEFICS Playground
🐨377
Comprehensive Computer Vision Backbones 🧩
This collection offers a variety of pre-trained computer vision backbones ideal for fine-tuning.
-
microsoft/resnet-50
Image Classification • 25.6M • Updated • 243k • • 471 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 86.4M • Updated • 1.2M • 392 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 88M • Updated • 5.62k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 3.05M • 99
Top Vision-Language Papers 🖼️💬📝
A curated list of papers on vision-language models, with the most influential ones at the top.
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 48 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27
Cutting-Edge Object Detection Models 🥥
-
facebook/detr-resnet-50
Object Detection • 41.6M • Updated • 1.14M • • 917 -
facebook/detr-resnet-101-dc5
Object Detection • 60.7M • Updated • 1.41k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 41.6M • Updated • 1.47k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 88.8k • 143
Shaheen Collection 🦅
Top Vision-Language Papers 🖼️💬📝
A curated list of papers on vision-language models, with the most influential ones at the top.
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 48 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27
Vision Foundation Models 🧩
Foundation models for computer vision.
-
Running110
Grounding DINO Demo
💻110Cutting edge open-vocabulary object detection app
-
RunningFeatured94
Owlv2
👀94State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured41
BLIP2 with transformers
🌖41BLIP2 (cutting edge image captioning) in 🤗transformers
-
Build errorFeatured377
IDEFICS Playground
🐨377
Cutting-Edge Object Detection Models 🥥
-
facebook/detr-resnet-50
Object Detection • 41.6M • Updated • 1.14M • • 917 -
facebook/detr-resnet-101-dc5
Object Detection • 60.7M • Updated • 1.41k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 41.6M • Updated • 1.47k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 88.8k • 143
Comprehensive Computer Vision Backbones 🧩
This collection offers a variety of pre-trained computer vision backbones ideal for fine-tuning.
-
microsoft/resnet-50
Image Classification • 25.6M • Updated • 243k • • 471 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 86.4M • Updated • 1.2M • 392 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 88M • Updated • 5.62k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 3.05M • 99