Can this model produce layout-aware JSON (blocks, bbox, polygons, hierarchy) ?

#6
by biswajitggiiygg - opened

Hi everyone πŸ‘‹

I’m evaluating Chandra OCR for document OCR on scanned PDFs and images, and I had a question about the structure of the output it can produce.

What I’m trying to achieve

I’m looking for an output format similar to a layout-aware document DOM, for example:

  • Page β†’ blocks β†’ children hierarchy
  • Explicit block_type (Page, SectionHeader, Text, Table, TableCell, etc.)
  • Bounding boxes / polygons for each block
  • HTML serialization per block (paragraphs, tables, headers)
  • Stable IDs like /page/0/Table/4
  • Section hierarchy tracking

Example :

{
  "id": "/page/0/Table/4",
  "block_type": "Table",
  "html": "<table>...</table>",
  "bbox": [x1, y1, x2, y2],
  "polygon": [[...]],
  "children": [...]
}

Or is Chandra intended to provide semantic OCR only (Markdown / HTML / raw text) without explicit geometry, requiring a separate layout-detection step?
Just want to confirm the intended scope and best practice here.
Thanks!

Sign up or log in to comment