Instructions to use fancyfeast/llama-joycaption-beta-one-hf-llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fancyfeast/llama-joycaption-beta-one-hf-llava with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="fancyfeast/llama-joycaption-beta-one-hf-llava") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("fancyfeast/llama-joycaption-beta-one-hf-llava") model = AutoModelForImageTextToText.from_pretrained("fancyfeast/llama-joycaption-beta-one-hf-llava") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use fancyfeast/llama-joycaption-beta-one-hf-llava with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "fancyfeast/llama-joycaption-beta-one-hf-llava" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fancyfeast/llama-joycaption-beta-one-hf-llava", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/fancyfeast/llama-joycaption-beta-one-hf-llava
- SGLang
How to use fancyfeast/llama-joycaption-beta-one-hf-llava with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "fancyfeast/llama-joycaption-beta-one-hf-llava" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fancyfeast/llama-joycaption-beta-one-hf-llava", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "fancyfeast/llama-joycaption-beta-one-hf-llava" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fancyfeast/llama-joycaption-beta-one-hf-llava", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use fancyfeast/llama-joycaption-beta-one-hf-llava with Docker Model Runner:
docker model run hf.co/fancyfeast/llama-joycaption-beta-one-hf-llava
Compatibility Issue with transformers Version > 4.44.2 Breaking JoyCaption Beta 1 in ComfyUI
I encountered a critical compatibility problem while using the JoyCaption Beta 1 node within ComfyUI (v0.3.15). After upgrading the transformers Python library beyond version 4.44.2 (e.g., to 4.51.0), the JoyCaption node fails to load properly and throws errors related to the tokenizer initialization:
Exception: data did not match any variant of untagged enum ModelWrapper at line XXXXX column X
Exception: fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Despite attempts to downgrade transformers to version 4.44.2 (the version recommended by some sources), the issue remains unresolved.
Details:
The error happens while loading the AutoProcessor from the model checkpoint.
Model files and directory structure have been verified correct.
The problem seems related to incompatible tokenizer or processor files within the checkpoint versus the transformers version.
Higher versions of transformers (e.g., 4.51.0) cause similar or worse failures.
The root cause might be a mismatch between checkpoint file formats and the transformers API expectations.
Attempts to fix via dependency downgrades or upgrades have been unsuccessful so far.
Request:
Has anyone successfully resolved this? Are there known fixes, updates to JoyCaption checkpoints, or specific version combinations of transformers that work reliably? Advice on how to handle this dependency challenge without destabilizing other parts of ComfyUI would be appreciated.
I was not able to reproduce this error on transformers version 4.54.0 in my Windows install of ComfyUI Desktop. If you are able to, and willing, I'd love more details on the error and possibly a dump of the environment (pip freeze). That might help me narrow down what's going on. I might also recommend deleting and redownloading the JoyCaption model, just in case the download got corrupted or interrupted.