Kimi-Audio BigVGAN (HF-format, pre-fused)

This is the BigVGAN vocoder from moonshotai/Kimi-Audio-7B-Instruct, repackaged to load directly into HuggingFace's transformers.models.qwen2_5_omni.Qwen2_5OmniToken2WavBigVGANModel:

  • weight_g / weight_v shards (NVIDIA-BigVGAN weight_norm parametrization) have been fused into plain .weight tensors.
  • Alias-free upsample/downsample .filter buffers have been dropped (HF marks them persistent=False and regenerates them from config at module construction).
  • Weights are saved as model.safetensors; the original config is preserved verbatim.

Used by vLLM-Omni's Kimi-Audio integration so the loader no longer needs a custom state-dict adapter.

Files

  • config.json — copied verbatim from upstream vocoder/config.json
  • model.safetensors — pre-fused BigVGAN weights
  • LICENSE — MIT (from upstream)

License and attribution

Released under the MIT License, matching the upstream model. Original weights © Moonshot AI, from moonshotai/Kimi-Audio-7B-Instruct. See LICENSE.

How to use

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers.models.qwen2_5_omni.configuration_qwen2_5_omni import (
    Qwen2_5OmniBigVGANConfig,
)
from transformers.models.qwen2_5_omni.modeling_qwen2_5_omni import (
    Qwen2_5OmniToken2WavBigVGANModel,
)
import json

config_path = hf_hub_download("zhangj1an/kimi-audio-bigvgan-hf", "config.json")
weights_path = hf_hub_download("zhangj1an/kimi-audio-bigvgan-hf", "model.safetensors")

h = json.load(open(config_path))
hf_config = Qwen2_5OmniBigVGANConfig(
    mel_dim=h["num_mels"],
    upsample_initial_channel=h["upsample_initial_channel"],
    resblock_kernel_sizes=h["resblock_kernel_sizes"],
    resblock_dilation_sizes=h["resblock_dilation_sizes"],
    upsample_rates=h["upsample_rates"],
    upsample_kernel_sizes=h["upsample_kernel_sizes"],
)
model = Qwen2_5OmniToken2WavBigVGANModel(hf_config)
model.load_state_dict(load_file(weights_path))
Downloads last month
6
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support