Higgs Audio v2 β€” MLX 6-bit

MLX port of bosonai/higgs-audio-v2-generation-3B-base quantized to 6-bit. 3B Llama-3.2-backed text-to-speech with multi-codebook acoustic tokens and delay-pattern streaming. Real-time voice cloning on Apple Silicon.

Install

Requires the Higgs Audio v2 code from Blaizzy/mlx-audio (pending upstream merge β€” currently on the higgs-audio-v2-port branch at kaioct-labs/mlx-audio).

Usage

from mlx_audio.tts.models.higgs_audio import HiggsAudioServer
import soundfile as sf

server = HiggsAudioServer.from_pretrained(
    model_path="mlx-community/higgs-audio-v2-3B-mlx-q6",
    codec_path="mlx-community/higgs-audio-v2-tokenizer",
)

result = server.generate(
    target_text="Hello from Higgs Audio on MLX.",
    reference_audio_path="reference.wav",
    reference_text="Transcript of the reference audio.",
)
sf.write("output.wav", result.pcm, result.sampling_rate)

Benchmark (M5 Max, warm, long-prompt)

variant RTF size
bf16 0.60Γ— 6.8 GB
q8 0.36Γ— 6.18 GB
q6 0.33Γ— 4.75 GB

Quantization

Selective β€” Llama backbone and text head at 6-bit, audio_codebook_embeddings and audio_decoder_proj.audio_lm_head kept at bf16 to preserve voice character. Config in config.json β†’ quantization block, automatically honored by HiggsAudioServer.from_pretrained.

License

Apache-2.0 (inherits from base model).

Downloads last month
-
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mlx-community/higgs-audio-v2-3B-mlx-q6

Finetuned
(2)
this model