File size: 4,535 Bytes

39c8284

---
license: apache-2.0
library_name: onnx
pipeline_tag: image-to-image
tags:
  - onnx
  - document-processing
  - document-unwarping
  - image-processing
  - ocr-preprocessing
  - computer-vision
---

# UVDoc Grid Output - Document Unwarping ONNX Model

This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model,
modified to output a **coordinate grid** instead of an image. This enables high-resolution document
unwarping via `cv2.remap()`.

## Model Description

UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed
documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version
outputs a coordinate mapping grid that can be applied to images of any resolution.

### Key Difference: Grid Output vs Image Output

| Approach | Output | Quality |
|----------|--------|---------|
| **Image-output models** | 288x288 RGB image | Poor (must upscale) |
| **This grid-output model** | 45x31 coordinate grid | Native resolution |

## Model Details

- **Architecture:** UVDoc (ResNet-based encoder-decoder)
- **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1]
- **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range
- **ONNX Opset:** 16
- **Size:** ~30 MB

### Input Specifications

| Property | Value |
|----------|-------|
| Shape | `(batch, 3, 720, 496)` |
| Format | RGB (not BGR) |
| Range | `[0, 1]` (normalized) |
| Layout | NCHW (batch, channels, height, width) |

### Output Specifications

| Property | Value |
|----------|-------|
| Shape | `(batch, 2, 45, 31)` |
| Channels | 2 (x, y coordinates) |
| Range | `[-1, 1]` (normalized coordinates) |
| Layout | NCHW (batch, channels, height, width) |

## Usage

### With ONNX Runtime (Python)

```python
import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]

# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720))  # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None]  # (1, 3, 720, 496)

# Run inference
result = session.run(None, {'image': blob})[0]  # (1, 2, 45, 31)

# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0))  # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

# Apply unwarping to original high-res image
unwarped = cv2.remap(
    image,
    map_x.astype(np.float32),
    map_y.astype(np.float32),
    interpolation=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REPLICATE
)

cv2.imwrite("unwarped_document.jpg", unwarped)
```

### With HuggingFace Hub

```python
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
    filename="UVDoc_grid.onnx"
)
```

## Training Details

This model was not retrained. It is a direct ONNX export of the original UVDoc weights from
[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the
2D coordinate grid (discarding the 3D shape output).

### Original Model

- **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887)
- **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
- **Published:** SIGGRAPH Asia 2023
- **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc)

## Limitations

- Input must be resized to 720x496 for inference (grid output is always 45x31)
- Works best on documents with visible text/content (needs features for grid estimation)
- May not handle extreme perspective distortions well
- CPU inference takes ~100-200ms per image

## Citation

If you use this model, please cite the original UVDoc paper:

```bibtex
@inproceedings{UVDoc,
    title={{UVDoc}: Neural Grid-based Document Unwarping},
    author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
    booktitle = {SIGGRAPH ASIA, Technical Papers},
    year = {2023},
    url={https://doi.org/10.1145/3610548.3618174}
}
```

## License

This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also
Apache 2.0 licensed. See the original repository for full license details.