Sample and Computation Redistribution for Efficient Face Detection
Paper
•
2105.04714
•
Published
Re-exported InsightFace models with proper dynamic batch support and no cross-frame contamination.
| Repository | Max Batch | Recommendation |
|---|---|---|
| This repo | 1-32 | ✅ Recommended - Optimal performance |
| alonsorobots/scrfd_320_batched_64 | 1-64 | For experimentation |
Batch=32 is optimal. Testing on RTX 5090 shows batch=64 provides no additional throughput benefit.
The original InsightFace ONNX models have issues with batch inference:
buffalo_l detection model: hardcoded batch=1buffalo_l_batch detection model: broken - has cross-frame contamination due to reshape operations that flatten the batch dimensionThese re-exports fix the dynamic_axes in the ONNX graph for true batch inference.
| Model | Task | Input Shape | Output | Batch | Speedup |
|---|---|---|---|---|---|
scrfd_10g_320_batch.onnx |
Face Detection | [N, 3, 320, 320] |
boxes, landmarks | 1-32 | 6× |
arcface_w600k_r50_batch.onnx |
Face Embedding | [N, 3, 112, 112] |
512-dim vectors | 1-32 | 10× |
| Batch Size | FPS | ms/frame |
|---|---|---|
| 1 | 867 | 1.15 |
| 16 | 5,498 | 0.18 |
| Batch Size | FPS | ms/embedding |
|---|---|---|
| 1 | 292 | 3.4 |
| 16 | 3,029 | 0.33 |
import numpy as np
import onnxruntime as ort
# Load model
sess = ort.InferenceSession("scrfd_10g_320_batch.onnx",
providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"])
# Batch inference
batch = np.random.randn(16, 3, 320, 320).astype(np.float32)
outputs = sess.run(None, {"input.1": batch})
# outputs[0-2]: scores per FPN level (stride 8, 16, 32)
# outputs[3-5]: bboxes per FPN level
# outputs[6-8]: keypoints per FPN level
# Same frame processed alone vs in batch = identical results
single_output = sess.run(None, {"input.1": frame[np.newaxis, ...]})
batch[7] = frame
batch_output = sess.run(None, {"input.1": batch})
max_diff = np.max(np.abs(single_output[0] - batch_output[0][7]))
# max_diff < 1e-5 ✓
These models were re-exported from InsightFace's PyTorch source using MMDetection with proper dynamic_axes:
dynamic_axes = {
"input.1": {0: "batch"},
"score_8": {0: "batch"},
"score_16": {0: "batch"},
# ... all outputs
}
See SCRFD_320_EXPORT_INSTRUCTIONS.md for details.
Non-commercial research purposes only - per InsightFace license.
For commercial licensing, contact: [email protected]