--- license: mit base_model: - rednote-hilab/dots.mocr pipeline_tag: image-text-to-text tags: - ocr - image-to-text - document-parse - transformers - quantized - 4bit - bitsandbytes - multilingual --- # Dots MOCR – 4-bit Quantized (NF4) ## 🔍 Introduction This repository provides a **4-bit quantized version of `dots.mocr`**, optimized using **BitsAndBytes (NF4 precision)** for efficient, low-memory inference. The original model is a powerful multimodal OCR system capable of: - Document parsing - Layout understanding - Multilingual OCR - Structured outputs (JSON / Markdown / SVG) This version enables deployment on **low-VRAM GPUs** while maintaining strong performance. --- ## ⚙️ Key Features - 4-bit quantization (NF4) - Reduced VRAM usage (~70–80%) - Faster inference - Compatible with Hugging Face Transformers - Supports OCR and document parsing - Suitable for edge and local deployments --- ## 🛠️ Installation (Base Setup) ⚠️ This model depends on the original `dots.mocr` repository. ```bash conda create -n dots_mocr python=3.12 conda activate dots_mocr git clone https://github.com/rednote-hilab/dots.mocr.git cd dots.mocr pip install -e . pip install flash-attn==2.8.0.post2 ``` --- ## 🚀 Usage (Quantized Inference) ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_id = "rednote-hilab/dots.mocr" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True ) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto", trust_remote_code=True ) # Example usage inputs = tokenizer("Extract text from image", return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## 📊 Quantization Details | Parameter | Value | |-----------------|----------| | Precision | 4-bit | | Quant Type | NF4 | | Compute Dtype | float16 | | Double Quant | Enabled | | Library | BitsAndBytes | --- ## 📌 Use Cases - Document OCR - PDF parsing - Layout detection - Structured data extraction - AI-powered document understanding - Edge deployment of large OCR models --- ## ⚠️ Limitations - Slight accuracy drop compared to full precision - GPU recommended for optimal performance - Some layers remain in higher precision - Not fully optimized for CPU inference --- ## 🔮 Future Work - GGUF conversion for CPU inference - FlashAttention optimization improvements - Integration with full OCR pipelines - Web UI (Gradio / Streamlit demo) - Benchmark comparisons (VRAM vs accuracy) --- ## 🙌 Acknowledgement - Base Model: `rednote-hilab/dots.mocr` - Quantization: BitsAndBytes - Framework: Hugging Face Transformers --- ## 📄 License MIT License