Update with support for transformers 5.x compatibility

#4

This PR updates the custom Nemotron Parse decoder to work with both the legacy Transformers 4.x cache API and the newer 4.57/5.x cache interfaces, including MBartDecoderLayer signature differences and generation input handling. It also adds a reproducible uv + Docker test environment and a golden regression suite with captured reference outputs, so future dependency upgrades can be checked against the current 4.51.3 baseline.

What changed

  • Added compatibility handling in hf_nemotron_parse_modeling.py for legacy tuple-based KV caches and newer Cache / EncoderDecoderCache flows.
  • Added prepare_inputs_for_generation support so cached decoding works correctly during generation.
  • Added pyproject.toml, Dockerfile, and docker-compose.yaml for a reproducible GPU-backed test environment.
  • Added test_golden.py and golden_outputs.json to validate preprocessing, encoder outputs, decoder logits, generation, and processor behavior against known-good outputs.

Testing

  • python test_golden.py --capture
  • pytest test_golden.py -v

Double checked this with your tests as well as an omnidocbench run and got consistent results between old branch and new branch on transformers 4.x and 5.x. Looks good

emelryan changed pull request status to open
emelryan changed pull request status to merged

Sign up or log in to comment