Update with support for transformers 5.x compatibility

by nvidia-oliver-holworthy - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+2637

-34

nvidia-oliver-holworthy

NVIDIA org 1 day ago

•

edited about 17 hours ago

This PR updates the custom Nemotron Parse decoder to work with both the legacy Transformers 4.x cache API and the newer 4.57/5.x cache interfaces, including MBartDecoderLayer signature differences and generation input handling. It also adds a reproducible uv + Docker test environment and a golden regression suite with captured reference outputs, so future dependency upgrades can be checked against the current 4.51.3 baseline.

What changed

Added compatibility handling in hf_nemotron_parse_modeling.py for legacy tuple-based KV caches and newer Cache / EncoderDecoderCache flows.
Added prepare_inputs_for_generation support so cached decoding works correctly during generation.
Added pyproject.toml, Dockerfile, and docker-compose.yaml for a reproducible GPU-backed test environment.
Added test_golden.py and golden_outputs.json to validate preprocessing, encoder outputs, decoder logits, generation, and processor behavior against known-good outputs.

Testing

python test_golden.py --capture
pytest test_golden.py -v

Fix MBartDecoderLayer forward pass for transformers 5.x compatibility382fc3a2

Add golden regression tests and Docker test environmente9fb5317

emelryan

NVIDIA org about 14 hours ago

Double checked this with your tests as well as an omnidocbench run and got consistent results between old branch and new branch on transformers 4.x and 5.x. Looks good

emelryan changed pull request status to open about 9 hours ago

emelryan changed pull request status to merged about 1 hour ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment