How did you manage to make a model trained in FP16 work on NVFP4, making it bigger?

#1
by yangus87 - opened

How did you manage to make a model trained in FP16 work on NVFP4, making it bigger?. It's incredible how you managed to make an FP16 model weigh more in NVF4, making it 4% heavier than the original.

Red Hat AI org

The original DeepSeek V4 Flash model is quantized to FP8 and FP4. We maintain FP8 block quantization for the same layers however, our checkpoint uses NVFP4 for the MoE expert layers, optimized for performant inference on NVIDIA Blackwell GPUs.

dsikka changed discussion status to closed

Sign up or log in to comment