File size: 4,535 Bytes
39c8284 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | ---
license: apache-2.0
library_name: onnx
pipeline_tag: image-to-image
tags:
- onnx
- document-processing
- document-unwarping
- image-processing
- ocr-preprocessing
- computer-vision
---
# UVDoc Grid Output - Document Unwarping ONNX Model
This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model,
modified to output a **coordinate grid** instead of an image. This enables high-resolution document
unwarping via `cv2.remap()`.
## Model Description
UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed
documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version
outputs a coordinate mapping grid that can be applied to images of any resolution.
### Key Difference: Grid Output vs Image Output
| Approach | Output | Quality |
|----------|--------|---------|
| **Image-output models** | 288x288 RGB image | Poor (must upscale) |
| **This grid-output model** | 45x31 coordinate grid | Native resolution |
## Model Details
- **Architecture:** UVDoc (ResNet-based encoder-decoder)
- **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1]
- **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range
- **ONNX Opset:** 16
- **Size:** ~30 MB
### Input Specifications
| Property | Value |
|----------|-------|
| Shape | `(batch, 3, 720, 496)` |
| Format | RGB (not BGR) |
| Range | `[0, 1]` (normalized) |
| Layout | NCHW (batch, channels, height, width) |
### Output Specifications
| Property | Value |
|----------|-------|
| Shape | `(batch, 2, 45, 31)` |
| Channels | 2 (x, y coordinates) |
| Range | `[-1, 1]` (normalized coordinates) |
| Layout | NCHW (batch, channels, height, width) |
## Usage
### With ONNX Runtime (Python)
```python
import cv2
import numpy as np
import onnxruntime as ort
# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])
# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]
# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720)) # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496)
# Run inference
result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31)
# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)
map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)
# Apply unwarping to original high-res image
unwarped = cv2.remap(
image,
map_x.astype(np.float32),
map_y.astype(np.float32),
interpolation=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE
)
cv2.imwrite("unwarped_document.jpg", unwarped)
```
### With HuggingFace Hub
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
filename="UVDoc_grid.onnx"
)
```
## Training Details
This model was not retrained. It is a direct ONNX export of the original UVDoc weights from
[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the
2D coordinate grid (discarding the 3D shape output).
### Original Model
- **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887)
- **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
- **Published:** SIGGRAPH Asia 2023
- **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc)
## Limitations
- Input must be resized to 720x496 for inference (grid output is always 45x31)
- Works best on documents with visible text/content (needs features for grid estimation)
- May not handle extreme perspective distortions well
- CPU inference takes ~100-200ms per image
## Citation
If you use this model, please cite the original UVDoc paper:
```bibtex
@inproceedings{UVDoc,
title={{UVDoc}: Neural Grid-based Document Unwarping},
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
booktitle = {SIGGRAPH ASIA, Technical Papers},
year = {2023},
url={https://doi.org/10.1145/3610548.3618174}
}
```
## License
This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also
Apache 2.0 licensed. See the original repository for full license details.
|