---
license: mit
base_model:
- rednote-hilab/dots.mocr
pipeline_tag: image-text-to-text
tags:
- ocr
- image-to-text
- document-parse
- transformers
- quantized
- 4bit
- bitsandbytes
- multilingual
---

# Dots MOCR – 4-bit Quantized (NF4)

## 🔍 Introduction

This repository provides a **4-bit quantized version of `dots.mocr`**, optimized using **BitsAndBytes (NF4 precision)** for efficient, low-memory inference.

The original model is a powerful multimodal OCR system capable of:
- Document parsing  
- Layout understanding  
- Multilingual OCR  
- Structured outputs (JSON / Markdown / SVG)

This version enables deployment on **low-VRAM GPUs** while maintaining strong performance.

---

## ⚙️ Key Features

- 4-bit quantization (NF4)  
- Reduced VRAM usage (~70–80%)  
- Faster inference  
- Compatible with Hugging Face Transformers  
- Supports OCR and document parsing  
- Suitable for edge and local deployments  

---

## 🛠️ Installation (Base Setup)

⚠️ This model depends on the original `dots.mocr` repository.

```bash
conda create -n dots_mocr python=3.12
conda activate dots_mocr

git clone https://github.com/rednote-hilab/dots.mocr.git
cd dots.mocr

pip install -e .
pip install flash-attn==2.8.0.post2
```

---

## 🚀 Usage (Quantized Inference)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "rednote-hilab/dots.mocr"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Example usage
inputs = tokenizer("Extract text from image", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 📊 Quantization Details

| Parameter        | Value     |
|-----------------|----------|
| Precision       | 4-bit     |
| Quant Type      | NF4       |
| Compute Dtype   | float16   |
| Double Quant    | Enabled   |
| Library         | BitsAndBytes |

---

## 📌 Use Cases

- Document OCR  
- PDF parsing  
- Layout detection  
- Structured data extraction  
- AI-powered document understanding  
- Edge deployment of large OCR models  

---

## ⚠️ Limitations

- Slight accuracy drop compared to full precision  
- GPU recommended for optimal performance  
- Some layers remain in higher precision  
- Not fully optimized for CPU inference  

---

## 🔮 Future Work

- GGUF conversion for CPU inference  
- FlashAttention optimization improvements  
- Integration with full OCR pipelines  
- Web UI (Gradio / Streamlit demo)  
- Benchmark comparisons (VRAM vs accuracy)  

---

## 🙌 Acknowledgement

- Base Model: `rednote-hilab/dots.mocr`  
- Quantization: BitsAndBytes  
- Framework: Hugging Face Transformers  

---

## 📄 License

MIT License