Instructions to use zhangj1an/kimi-audio-bigvgan-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zhangj1an/kimi-audio-bigvgan-hf with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("zhangj1an/kimi-audio-bigvgan-hf", dtype="auto") - KimiAudio
How to use zhangj1an/kimi-audio-bigvgan-hf with KimiAudio:
# Example usage for KimiAudio # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git from kimia_infer.api.kimia import KimiAudio model = KimiAudio(model_path="zhangj1an/kimi-audio-bigvgan-hf", load_detokenizer=True) sampling_params = { "audio_temperature": 0.8, "audio_top_k": 10, "text_temperature": 0.0, "text_top_k": 5, } # For ASR asr_audio = "asr_example.wav" messages_asr = [ {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"}, {"role": "user", "message_type": "audio", "content": asr_audio} ] _, text = model.generate(messages_asr, **sampling_params, output_type="text") print(text) # For Q&A qa_audio = "qa_example.wav" messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}] wav, text = model.generate(messages_conv, **sampling_params, output_type="both") sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000) print(text) - Notebooks
- Google Colab
- Kaggle
Kimi-Audio BigVGAN (HF-format, pre-fused)
This is the BigVGAN vocoder from
moonshotai/Kimi-Audio-7B-Instruct,
repackaged to load directly into HuggingFace's
transformers.models.qwen2_5_omni.Qwen2_5OmniToken2WavBigVGANModel:
weight_g/weight_vshards (NVIDIA-BigVGANweight_normparametrization) have been fused into plain.weighttensors.- Alias-free upsample/downsample
.filterbuffers have been dropped (HF marks thempersistent=Falseand regenerates them from config at module construction). - Weights are saved as
model.safetensors; the original config is preserved verbatim.
Used by vLLM-Omni's Kimi-Audio integration so the loader no longer needs a custom state-dict adapter.
Files
config.json— copied verbatim from upstreamvocoder/config.jsonmodel.safetensors— pre-fused BigVGAN weightsLICENSE— MIT (from upstream)
License and attribution
Released under the MIT License, matching the upstream model.
Original weights © Moonshot AI, from
moonshotai/Kimi-Audio-7B-Instruct. See LICENSE.
How to use
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers.models.qwen2_5_omni.configuration_qwen2_5_omni import (
Qwen2_5OmniBigVGANConfig,
)
from transformers.models.qwen2_5_omni.modeling_qwen2_5_omni import (
Qwen2_5OmniToken2WavBigVGANModel,
)
import json
config_path = hf_hub_download("zhangj1an/kimi-audio-bigvgan-hf", "config.json")
weights_path = hf_hub_download("zhangj1an/kimi-audio-bigvgan-hf", "model.safetensors")
h = json.load(open(config_path))
hf_config = Qwen2_5OmniBigVGANConfig(
mel_dim=h["num_mels"],
upsample_initial_channel=h["upsample_initial_channel"],
resblock_kernel_sizes=h["resblock_kernel_sizes"],
resblock_dilation_sizes=h["resblock_dilation_sizes"],
upsample_rates=h["upsample_rates"],
upsample_kernel_sizes=h["upsample_kernel_sizes"],
)
model = Qwen2_5OmniToken2WavBigVGANModel(hf_config)
model.load_state_dict(load_file(weights_path))
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support