Instructions to use ibm-granite/granite-vision-3.1-2b-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ibm-granite/granite-vision-3.1-2b-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ibm-granite/granite-vision-3.1-2b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview")
model = AutoModelForImageTextToText.from_pretrained("ibm-granite/granite-vision-3.1-2b-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ibm-granite/granite-vision-3.1-2b-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ibm-granite/granite-vision-3.1-2b-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ibm-granite/granite-vision-3.1-2b-preview

SGLang

How to use ibm-granite/granite-vision-3.1-2b-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ibm-granite/granite-vision-3.1-2b-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ibm-granite/granite-vision-3.1-2b-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ibm-granite/granite-vision-3.1-2b-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ibm-granite/granite-vision-3.1-2b-preview with Docker Model Runner:
```
docker model run hf.co/ibm-granite/granite-vision-3.1-2b-preview
```

Use Negative Feature Layer Indices

by abrooks9944 - opened Feb 15, 2025

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

-4

abrooks9944

IBM Granite org Feb 15, 2025

There is a misalignment in the feature layers that are currently being used between transformers and vLLM (current values are correct for vLLM and off by one for transformers); In transformers, the first entry is the input embedding (here). However, in vLLM, this is not the case for the way the hidden states pool are formed (here).

In other words, the hidden states array in transformers contains 28 entries:
[emb, h0, h1, ..., h27]

while the hidden states pool in vLLM contains 27 entries:
[h0, h1, ..., h27]

The config reflects the correct values for what is used in vLLM, but is off by one in transformers. Both projects support negative indexing into the hidden states (with offset handling in vLLM, since only the deepest feature layer needed it loaded) - this PR changes the vision feature layers to use negative indices, which will fix the misalignment in transformers without changing the output in vLLM (no code changes needed).

I will also submit a PR to vLLM to add the embeddings to the hidden state pool if all hidden states are requested from the visual encoder.

Use Negative Feature Layer Indicesb04f90bf

aarbelle

Feb 16, 2025

Thanks @abrooks9944

aarbelle changed pull request status to merged Feb 16, 2025

abrooks9944

IBM Granite org Feb 19, 2025

The model uses negative feature layers now that this PR was merged, but here is the link to the relevant PR for vLLM for positive feature layers - https://github.com/vllm-project/vllm/pull/13514

aarbelle

Feb 19, 2025

@abrooks9944
So future models can also use positive features?
Do we need to update anything in this model?

abrooks9944

IBM Granite org Feb 19, 2025

Hey @aarbelle - yes, in future releases of vLLM, it will align with transformers and the correct positive features layers ([4, 8, 16, 27]) can be used to get the same output as the corresponding negative layers ([-24, -20, -12, -1]),

However, I would still suggest using negative feature layers so that the outputs are consistent if the model is used with earlier versions of vLLM; the PR above also fixes a bug in vLLM that causes positive feature layers to load one more layer than needed to get the deepest feature, which will throw if the last layer is requested (e.g., for this model), so keeping it negative will ensure it will work correctly with older vLLM versions!

Also no, nothing needs to be changed in the model :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment