File size: 4,535 Bytes
39c8284
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
license: apache-2.0
library_name: onnx
pipeline_tag: image-to-image
tags:
  - onnx
  - document-processing
  - document-unwarping
  - image-processing
  - ocr-preprocessing
  - computer-vision
---

# UVDoc Grid Output - Document Unwarping ONNX Model

This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model,
modified to output a **coordinate grid** instead of an image. This enables high-resolution document
unwarping via `cv2.remap()`.

## Model Description

UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed
documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version
outputs a coordinate mapping grid that can be applied to images of any resolution.

### Key Difference: Grid Output vs Image Output

| Approach | Output | Quality |
|----------|--------|---------|
| **Image-output models** | 288x288 RGB image | Poor (must upscale) |
| **This grid-output model** | 45x31 coordinate grid | Native resolution |

## Model Details

- **Architecture:** UVDoc (ResNet-based encoder-decoder)
- **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1]
- **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range
- **ONNX Opset:** 16
- **Size:** ~30 MB

### Input Specifications

| Property | Value |
|----------|-------|
| Shape | `(batch, 3, 720, 496)` |
| Format | RGB (not BGR) |
| Range | `[0, 1]` (normalized) |
| Layout | NCHW (batch, channels, height, width) |

### Output Specifications

| Property | Value |
|----------|-------|
| Shape | `(batch, 2, 45, 31)` |
| Channels | 2 (x, y coordinates) |
| Range | `[-1, 1]` (normalized coordinates) |
| Layout | NCHW (batch, channels, height, width) |

## Usage

### With ONNX Runtime (Python)

```python
import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]

# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720))  # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None]  # (1, 3, 720, 496)

# Run inference
result = session.run(None, {'image': blob})[0]  # (1, 2, 45, 31)

# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0))  # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

# Apply unwarping to original high-res image
unwarped = cv2.remap(
    image,
    map_x.astype(np.float32),
    map_y.astype(np.float32),
    interpolation=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REPLICATE
)

cv2.imwrite("unwarped_document.jpg", unwarped)
```

### With HuggingFace Hub

```python
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
    filename="UVDoc_grid.onnx"
)
```

## Training Details

This model was not retrained. It is a direct ONNX export of the original UVDoc weights from
[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the
2D coordinate grid (discarding the 3D shape output).

### Original Model

- **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887)
- **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
- **Published:** SIGGRAPH Asia 2023
- **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc)

## Limitations

- Input must be resized to 720x496 for inference (grid output is always 45x31)
- Works best on documents with visible text/content (needs features for grid estimation)
- May not handle extreme perspective distortions well
- CPU inference takes ~100-200ms per image

## Citation

If you use this model, please cite the original UVDoc paper:

```bibtex
@inproceedings{UVDoc,
    title={{UVDoc}: Neural Grid-based Document Unwarping},
    author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
    booktitle = {SIGGRAPH ASIA, Technical Papers},
    year = {2023},
    url={https://doi.org/10.1145/3610548.3618174}
}
```

## License

This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also
Apache 2.0 licensed. See the original repository for full license details.