fix: NHWC data_format for UNet/VAE decoder under librknnrt 2.3.2

#6
by jaysom - opened

Summary

Running this model on an Orange Pi 5 Plus with librknnrt 2.3.2 (2025-04-09) segfaults at the first UNet inference step. The runtime prints:

W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.

...and then dies. The 2.3.2 runtime tightened tensor-layout validation and can no longer reliably auto-convert NCHW inputs to NHWC for this specific UNet.

Fix

Pass data_format per-model and transpose 4-D inputs in Python at the RKNN boundary before calling rknnlite.inference():

  • text_encoder stays NCHW
  • unet and vae_decoder run NHWC with Python-side transpose

The public API of RKNN2Model gains a single optional data_format kwarg that defaults to "nchw", so any existing caller that doesn't pass it behaves exactly as before. The pipeline constructor in main() is updated to pass "nhwc" for the two models that need it.

No new files, no new dependencies, no rewrite. ~25 lines changed.

Environment where the bug reproduces

  • Orange Pi 5 Plus (RK3588), kernel 6.1.x
  • librknnrt 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
  • rknn-toolkit-lite2 2.3.2
  • rknpu driver v0.9.8

Verified working

After applying this patch, 512Γ—512 LCM inference runs in ~34 s on a single RK3588 core, matching the README benchmark.

Credit

@darkbit1001 already solved this in a fork with a larger rewrite β€” this PR is the minimal delta against your original runner so the fix can land upstream for everyone.

This commit looks AI-generated. Have you personally tested this?

Yeah, fair check. Wrote the description with help, but the patch is mine and I run it daily on an Orange Pi 5 Plus with librknnrt 2.3.2. Hardware: RK3588, kernel 6.1.x, rknn-toolkit-lite2 2.3.2. Without this change my runner segfaults at the first UNet step on the NHWC warning; with it, 512x512 LCM finishes in ~34s on a single core, matching your README benchmark.

Okay, then thanks for your contribution! Although maybe I will migrate my legacy projects to use my library https://github.com/happyme531/ztu_somemodelruntime_ez_rknn_async later to fully get rid of such weird bugs.

happyme531 changed pull request status to merged

Thanks for merging. I had a quick look at ez_rknn_async, the async inference side of it is the main thing I've been wanting on RK3588. Sharing the NPU between SD and chat models has been the main friction for me. I'll have a play with it on an orange pi this week.

Sign up or log in to comment