Update with support for transformers 5.x compatibility
#4
by nvidia-oliver-holworthy - opened
This PR updates the custom Nemotron Parse decoder to work with both the legacy Transformers 4.x cache API and the newer 4.57/5.x cache interfaces, including MBartDecoderLayer signature differences and generation input handling. It also adds a reproducible uv + Docker test environment and a golden regression suite with captured reference outputs, so future dependency upgrades can be checked against the current 4.51.3 baseline.
What changed
- Added compatibility handling in
hf_nemotron_parse_modeling.pyfor legacy tuple-based KV caches and newer Cache / EncoderDecoderCache flows. - Added
prepare_inputs_for_generationsupport so cached decoding works correctly during generation. - Added
pyproject.toml,Dockerfile, anddocker-compose.yamlfor a reproducible GPU-backed test environment. - Added
test_golden.pyandgolden_outputs.jsonto validate preprocessing, encoder outputs, decoder logits, generation, and processor behavior against known-good outputs.
Testing
- python test_golden.py --capture
- pytest test_golden.py -v
Double checked this with your tests as well as an omnidocbench run and got consistent results between old branch and new branch on transformers 4.x and 5.x. Looks good
emelryan changed pull request status to open
emelryan changed pull request status to merged