Instructions to use Qwen/Qwen2-VL-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen2-VL-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen2-VL-7B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Qwen/Qwen2-VL-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen2-VL-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct

SGLang

How to use Qwen/Qwen2-VL-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen2-VL-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen2-VL-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen2-VL-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct
```

An error occurred: shape mismatch

#33

by VeeP - opened Sep 11, 2024

Discussion

VeeP

Sep 11, 2024

Hello,

Would someone be able to help me with this error?

My code prompts for a local image on my system then runs it through the model. All the files are locally stored.

It seems the file is opened and about to be processed, then the error

My assumption is it will analyze the image and provide a text description.

The only issue I notice is, I do have a gpu, but it always uses the CPU. Could that be the cause?

2024-09-11 07:26:36,483 - INFO - Generating description for the image...
2024-09-11 07:26:36,508 - INFO - Image opened successfully. Original size: (512, 512)
2024-09-11 07:26:36,518 - INFO - Image resized to: (448, 448)
2024-09-11 07:26:36,521 - INFO - Model moved to cpu
2024-09-11 07:26:36,521 - INFO - Processing image with the processor...
2024-09-11 07:26:36,542 - INFO - Input tensor 'input_ids' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'attention_mask' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'pixel_values' shape: torch.Size([1024, 1176])
2024-09-11 07:26:36,542 - INFO - Input tensor 'image_grid_thw' shape: torch.Size([1, 3])
2024-09-11 07:26:36,543 - INFO - Generating output from the model...
Setting pad_token_id to eos_token_id:151645 for open-end generation.
2024-09-11 07:26:38,410 - ERROR - An error occurred: shape mismatch: value tensor of shape [256, 3584] cannot be broadcast to indexing result of shape [0, 3584]
None

sorry if this is a dupe msg

Thanks,

2U1

Sep 11, 2024

I think your text has no image token for it.
Check if your text has the token vision_start, image_pad, vision_end.

VeeP

Sep 11, 2024

•

edited Sep 11, 2024

oh, I thought that was part of qwen-vl-utils. I am not sure where I check this but have this:

ry:
from qwen_vl_utils import process_vision_info

    image = Image.open(image_path).convert('RGB') 
    logging.info(f"Image opened successfully. Original size: {image.size}, Mode: {image.mode}")

    # Resize the image 
    image = image.resize((IMAGE_SIZE, IMAGE_SIZE))
    logging.info(f"Image resized to: {image.size}")

    # Device handling with explicit CUDA check
    if torch.cuda.is_available():
        device = torch.device("cuda")
        logging.info("CUDA-enabled GPU is available. Moving model and inputs to GPU.")
    else:
        device = torch.device("cpu")
        logging.info("No CUDA-enabled GPU found. Using CPU for processing.")

    model = model.to(device)

    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image, "image_id": image_id} if image_id else {"type": "image", "image": image},
                {"type": "text", "text": "Describe this image."}
            ]
        }
    ]

2U1

Sep 12, 2024

The tokens should be added by processor.apply_chat_template

VeeP

Sep 21, 2024

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment