Instructions to use Qwen/Qwen2-VL-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen2-VL-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen2-VL-7B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct") model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2-VL-7B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen2-VL-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen2-VL-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct
- SGLang
How to use Qwen/Qwen2-VL-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2-VL-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2-VL-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen2-VL-7B-Instruct with Docker Model Runner:
docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct
An error occurred: shape mismatch
Hello,
Would someone be able to help me with this error?
My code prompts for a local image on my system then runs it through the model. All the files are locally stored.
It seems the file is opened and about to be processed, then the error
My assumption is it will analyze the image and provide a text description.
The only issue I notice is, I do have a gpu, but it always uses the CPU. Could that be the cause?
2024-09-11 07:26:36,483 - INFO - Generating description for the image...
2024-09-11 07:26:36,508 - INFO - Image opened successfully. Original size: (512, 512)
2024-09-11 07:26:36,518 - INFO - Image resized to: (448, 448)
2024-09-11 07:26:36,521 - INFO - Model moved to cpu
2024-09-11 07:26:36,521 - INFO - Processing image with the processor...
2024-09-11 07:26:36,542 - INFO - Input tensor 'input_ids' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'attention_mask' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'pixel_values' shape: torch.Size([1024, 1176])
2024-09-11 07:26:36,542 - INFO - Input tensor 'image_grid_thw' shape: torch.Size([1, 3])
2024-09-11 07:26:36,543 - INFO - Generating output from the model...
Setting pad_token_id to eos_token_id:151645 for open-end generation.
2024-09-11 07:26:38,410 - ERROR - An error occurred: shape mismatch: value tensor of shape [256, 3584] cannot be broadcast to indexing result of shape [0, 3584]
None
sorry if this is a dupe msg
Thanks,
V
I think your text has no image token for it.
Check if your text has the token vision_start, image_pad, vision_end.
oh, I thought that was part of qwen-vl-utils. I am not sure where I check this but have this:
ry:
from qwen_vl_utils import process_vision_info
image = Image.open(image_path).convert('RGB')
logging.info(f"Image opened successfully. Original size: {image.size}, Mode: {image.mode}")
# Resize the image
image = image.resize((IMAGE_SIZE, IMAGE_SIZE))
logging.info(f"Image resized to: {image.size}")
# Device handling with explicit CUDA check
if torch.cuda.is_available():
device = torch.device("cuda")
logging.info("CUDA-enabled GPU is available. Moving model and inputs to GPU.")
else:
device = torch.device("cpu")
logging.info("No CUDA-enabled GPU found. Using CPU for processing.")
model = model.to(device)
conversation = [
{
"role": "user",
"content": [
{"type": "image", "image": image, "image_id": image_id} if image_id else {"type": "image", "image": image},
{"type": "text", "text": "Describe this image."}
]
}
]
The tokens should be added by processor.apply_chat_template
Thank you!