BBox coordinates in PaddleOCR-VL JSON don’t match PDF crop — how to correctly map/convert coordinates?

by sogm1 - opened Feb 3

Feb 3

•

Hi, thanks for releasing PaddleOCR-VL — the parsing quality is great.

When I parse a PDF with PaddleOCR-VL (Model A), the output JSON includes bounding boxes (bbox). However, when I try to crop the PDF using those bbox coordinates (via pdfplumber), the cropped regions do not match the actual object positions(like table, figure)
What coordinate system does PaddleOCR-VL use for bbox in the JSON output?

That's how i call paddle pipeline:
pipeline = PaddleOCRVL(
pipeline_version="v1",
device="gpu:0",
use_layout_detection=True,
use_doc_orientation_classify=True,
use_doc_unwarping=True,
)

maybe "use_doc_unwarping=True" occurs this result.

Is there an official / recommended way to convert PaddleOCR-VL bbox to PDF page coordinates for accurate cropping in use_doc_unwarping?

i will hope to get u guys reply
thank u

ChengCui

PaddlePaddle org Feb 9

Hi，@sogm1 when use_doc_unwarping is set to True, the image pixels will be shifted, which causes the output coordinates to no longer correspond to the original image. You need to set use_doc_unwarping to False.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment