Higgs Audio v2 β MLX 6-bit
MLX port of bosonai/higgs-audio-v2-generation-3B-base quantized to 6-bit. 3B Llama-3.2-backed text-to-speech with multi-codebook acoustic tokens and delay-pattern streaming. Real-time voice cloning on Apple Silicon.
Install
Requires the Higgs Audio v2 code from Blaizzy/mlx-audio (pending upstream merge β currently on the higgs-audio-v2-port branch at kaioct-labs/mlx-audio).
Usage
from mlx_audio.tts.models.higgs_audio import HiggsAudioServer
import soundfile as sf
server = HiggsAudioServer.from_pretrained(
model_path="mlx-community/higgs-audio-v2-3B-mlx-q6",
codec_path="mlx-community/higgs-audio-v2-tokenizer",
)
result = server.generate(
target_text="Hello from Higgs Audio on MLX.",
reference_audio_path="reference.wav",
reference_text="Transcript of the reference audio.",
)
sf.write("output.wav", result.pcm, result.sampling_rate)
Benchmark (M5 Max, warm, long-prompt)
| variant | RTF | size |
|---|---|---|
| bf16 | 0.60Γ | 6.8 GB |
| q8 | 0.36Γ | 6.18 GB |
| q6 | 0.33Γ | 4.75 GB |
Quantization
Selective β Llama backbone and text head at 6-bit, audio_codebook_embeddings and audio_decoder_proj.audio_lm_head kept at bf16 to preserve voice character. Config in config.json β quantization block, automatically honored by HiggsAudioServer.from_pretrained.
License
Apache-2.0 (inherits from base model).
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for mlx-community/higgs-audio-v2-3B-mlx-q6
Base model
bosonai/higgs-audio-v2-generation-3B-base