Can this model produce layout-aware JSON (blocks, bbox, polygons, hierarchy) ?
#6
by
biswajitggiiygg
- opened
Hi everyone π
Iβm evaluating Chandra OCR for document OCR on scanned PDFs and images, and I had a question about the structure of the output it can produce.
What Iβm trying to achieve
Iβm looking for an output format similar to a layout-aware document DOM, for example:
- Page β blocks β children hierarchy
- Explicit
block_type(Page, SectionHeader, Text, Table, TableCell, etc.) - Bounding boxes / polygons for each block
- HTML serialization per block (paragraphs, tables, headers)
- Stable IDs like
/page/0/Table/4 - Section hierarchy tracking
Example :
{
"id": "/page/0/Table/4",
"block_type": "Table",
"html": "<table>...</table>",
"bbox": [x1, y1, x2, y2],
"polygon": [[...]],
"children": [...]
}
Or is Chandra intended to provide semantic OCR only (Markdown / HTML / raw text) without explicit geometry, requiring a separate layout-detection step?
Just want to confirm the intended scope and best practice here.
Thanks!