Instructions to use fancyfeast/llama-joycaption-beta-one-hf-llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fancyfeast/llama-joycaption-beta-one-hf-llava with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="fancyfeast/llama-joycaption-beta-one-hf-llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("fancyfeast/llama-joycaption-beta-one-hf-llava")
model = AutoModelForImageTextToText.from_pretrained("fancyfeast/llama-joycaption-beta-one-hf-llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use fancyfeast/llama-joycaption-beta-one-hf-llava with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fancyfeast/llama-joycaption-beta-one-hf-llava"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fancyfeast/llama-joycaption-beta-one-hf-llava",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/fancyfeast/llama-joycaption-beta-one-hf-llava

SGLang

How to use fancyfeast/llama-joycaption-beta-one-hf-llava with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fancyfeast/llama-joycaption-beta-one-hf-llava" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fancyfeast/llama-joycaption-beta-one-hf-llava",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fancyfeast/llama-joycaption-beta-one-hf-llava" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fancyfeast/llama-joycaption-beta-one-hf-llava",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use fancyfeast/llama-joycaption-beta-one-hf-llava with Docker Model Runner:
```
docker model run hf.co/fancyfeast/llama-joycaption-beta-one-hf-llava
```

Compatibility Issue with transformers Version > 4.44.2 Breaking JoyCaption Beta 1 in ComfyUI

by dhawgood - opened Jun 30, 2025

Discussion

dhawgood

Jun 30, 2025

I encountered a critical compatibility problem while using the JoyCaption Beta 1 node within ComfyUI (v0.3.15). After upgrading the transformers Python library beyond version 4.44.2 (e.g., to 4.51.0), the JoyCaption node fails to load properly and throws errors related to the tokenizer initialization:

Exception: data did not match any variant of untagged enum ModelWrapper at line XXXXX column X
Exception: fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)

Despite attempts to downgrade transformers to version 4.44.2 (the version recommended by some sources), the issue remains unresolved.

Details:

The error happens while loading the AutoProcessor from the model checkpoint.

Model files and directory structure have been verified correct.

The problem seems related to incompatible tokenizer or processor files within the checkpoint versus the transformers version.

Higher versions of transformers (e.g., 4.51.0) cause similar or worse failures.

The root cause might be a mismatch between checkpoint file formats and the transformers API expectations.

Attempts to fix via dependency downgrades or upgrades have been unsuccessful so far.

Request:
Has anyone successfully resolved this? Are there known fixes, updates to JoyCaption checkpoints, or specific version combinations of transformers that work reliably? Advice on how to handle this dependency challenge without destabilizing other parts of ComfyUI would be appreciated.

fancyfeast

Owner Jul 25, 2025

I was not able to reproduce this error on transformers version 4.54.0 in my Windows install of ComfyUI Desktop. If you are able to, and willing, I'd love more details on the error and possibly a dump of the environment (pip freeze). That might help me narrow down what's going on. I might also recommend deleting and redownloading the JoyCaption model, just in case the download got corrupted or interrupted.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment