text_extractor / README.md
prans-cs55's picture
Update README.md
a4d2f09 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Text Extractor
emoji: πŸ‘€
colorFrom: gray
colorTo: indigo
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
license: mit

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


.

🧠 OCR Text Extractor + Summarizer

An AI-powered tool that extracts text from images using Tesseract OCR and then summarizes it using a transformer model. Upload any image (screenshots, photos, scanned documents, notes) β†’ Get clean extracted text + an AI summary.

πŸš€ Features

πŸ“€ Upload an image with text

πŸ”Ž Extracts text using Tesseract OCR

✨ Summarizes extracted text using HuggingFace transformers

⚑ Fast, simple Gradio UI

πŸ› οΈ Works on CPU β€” no GPU required

🧩 How it Works

Image is processed with Tesseract OCR

Extracted text is cleaned

Text is fed into a pretrained summarization model

Output summary is displayed instantly

πŸ—‚οΈ Project Structure β”œβ”€β”€ app.py β”œβ”€β”€ requirements.txt β”œβ”€β”€ packages.txt └── README.md

πŸ“¦ Dependencies Python packages (requirements.txt) gradio pillow pytesseract transformers torch tesseract

System packages (packages.txt) tesseract-ocr tesseract-ocr-eng

These ensure Tesseract OCR runs correctly on HuggingFace Spaces.

▢️ Running Locally pip install -r requirements.txt python app.py

πŸ“Έ Demo

Just upload an image β†’ click Submit β†’ done!

πŸ™Œ Acknowledgements

Tesseract OCR

HuggingFace Transformers

Gradio for UI

πŸ”— Try the live Space

πŸ‘‰https://huggingface.co/spaces/prans-cs55/text_extractor