Instructions to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with PaddleOCR:

# Please refer to the document for information on how to use the model.
# https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/module_overview.html

llama-cpp-python

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="adambarbato/PaddleOCR-VL-For-Manga-GGUF",
	filename="PaddleOCR-VL-For-Manga-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Use Docker

docker model run hf.co/adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

LM Studio
Jan

vLLM

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "adambarbato/PaddleOCR-VL-For-Manga-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "adambarbato/PaddleOCR-VL-For-Manga-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Ollama
How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with Ollama:
```
ollama run hf.co/adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
```

Unsloth Studio

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for adambarbato/PaddleOCR-VL-For-Manga-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for adambarbato/PaddleOCR-VL-For-Manga-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for adambarbato/PaddleOCR-VL-For-Manga-GGUF to start chatting

Docker Model Runner
How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with Docker Model Runner:
```
docker model run hf.co/adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16
```

Lemonade

How to use adambarbato/PaddleOCR-VL-For-Manga-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull adambarbato/PaddleOCR-VL-For-Manga-GGUF:BF16

Run and chat with the model

lemonade run user.PaddleOCR-VL-For-Manga-GGUF-BF16

List all available models

lemonade list

PaddleOCR-VL-For-Manga GGUF

This repo contains BF16 quantized GGUF files of jzhang533's PaddleOCR-VL-For-Manga model.

It can be run via llama.cpp with the following parameters:

llama-server \
  -m ./PaddleOCR-VL-For-Manga-GGUF/PaddleOCR-VL-For-Manga-BF16.gguf
  --mmproj ./PaddleOCR-VL-For-Manga-GGUF/PaddleOCR-VL-For-Manga-mmproj-BF16.gguf
  --host 0.0.0.0
  --n-gpu-layers 999
  --port 9999
  -c 32768
  --temp 0

Send an image along with the prompt OCR: to receive an OCR response. If you enounter this error: tokenize: error: number of bitmaps (1) does not match number of markers (0). send <__media__>OCR: as your prompt and see if that resolves the issue.

Original discussion for PaddleOCR-VL support in llama.cpp can be found here.

Original model card for PaddleOCR-VL-For-Manga is copied below:

PaddleOCR-VL-For-Manga

Model Description

PaddleOCR-VL-For-Manga is an OCR model enhanced for Japanese manga text recognition. It is fine-tuned from PaddleOCR-VL and achieves much higher accuracy on manga speech bubbles and stylized fonts.

This model was fine-tuned on a combination of the Manga109-s dataset and 1.5 million synthetic data samples. It showcases the potential of Supervised Fine-Tuning (SFT) to create highly accurate, domain-specific VLMs for OCR tasks from a powerful, general-purpose base like PaddleOCR-VL, which supports 109 languages.

This project serves as a practical guide for developers looking to build their own custom OCR solutions. You can find the training code at the Github Repository, a step by step tutorial is avaiable here.

Performance

The model achieves a 70% full-sentence accuracy on a test set of Manga109-s crops (representing a 10% split of the dataset). For comparison, the original PaddleOCR-VL on the same test dataset achieves 27% full sentence accuracy.

Common errors involve discrepancies between visually similar characters that are often used interchangeably, such as:

！？ vs. !? (Full-width vs. half-width punctuation)
ＯＫ vs. ok (Full-width vs. half-width letters)
１２０５ vs. 1205 (Full-width vs. half-width numbers)
“人” (U+4EBA) vs. “⼈” (U+2F08) (Standard CJK Unified Ideograph vs. CJK Radical)

The prevalence of these character types highlights a limitation of standard metrics like Character Error Rate (CER). These metrics may not fully capture the model's practical accuracy, as they penalize semantically equivalent variations that are common in stylized text.

Examples

#	Image	Prediction
1		心拍呼吸正常値お人よし度過剰値... 間違いなくパパッ...! 生存確認っ...!
2		あとは『メルニィ宇宙鉄道』とか『TipTap』とか全部その人が考えたらしい
3		★コミックス20巻1月4日(土)発売〟TVアニメ1月11日(土)放送開始!!
4		我々魔女協会が長年追い続ける最大の敵ウロロが「王の魔法」ならあれは世界を削り変える「神の魔法」
5		天弓の動きについてくだけじゃ勝てねぇ…！

How to Use

You can use this model with the transformers, PaddleOCR, or any library that supports PaddleOCR-VL to perform OCR on manga images. The model architecture and weights layout are identical to the base model.

If your application involves documents with structured layouts, you can use your fine-tuned OCR model in conjunction with PP-DocLayoutV2 for layout analysis. However, for manga, the reading order and layout are quite different.

Training Details

Base Model: PaddleOCR-VL
Dataset:
- Manga109-s: 0.1 million randomly sampled text-region crops (not full pages) were used for training (90% split); the remaining 10% crops were used for testing.
- Synthetic Data: 1.5 million generated samples.
Training Frameworks:
- transformers and trl
Alternatives for SFT:
- ERNIEKit
- ms-swift

Acknowledgements

Manga109-s dataset, which provided the manga text-region crops used for training and evaluation.
PaddleOCR-VL, the base Visual Language Model on which this model is fine-tuned.
manga-ocr, used in this project for data processing and synthetic data generation; it also inspired practical workflows and evaluation considerations for manga OCR.