Higgs Audio v2 — MLX 6-bit

MLX port of bosonai/higgs-audio-v2-generation-3B-base quantized to 6-bit. 3B Llama-3.2-backed text-to-speech with multi-codebook acoustic tokens and delay-pattern streaming. Real-time voice cloning on Apple Silicon.

Install

Requires the Higgs Audio v2 code from Blaizzy/mlx-audio (pending upstream merge — currently on the higgs-audio-v2-port branch at kaioct-labs/mlx-audio).

Usage

from mlx_audio.tts.models.higgs_audio import HiggsAudioServer
import soundfile as sf

server = HiggsAudioServer.from_pretrained(
    model_path="mlx-community/higgs-audio-v2-3B-mlx-q6",
    codec_path="mlx-community/higgs-audio-v2-tokenizer",
)

result = server.generate(
    target_text="Hello from Higgs Audio on MLX.",
    reference_audio_path="reference.wav",
    reference_text="Transcript of the reference audio.",
)
sf.write("output.wav", result.pcm, result.sampling_rate)

Benchmark (M5 Max, warm, long-prompt)

variant	RTF	size
bf16	0.60×	6.8 GB
q8	0.36×	6.18 GB
q6	0.33×	4.75 GB

Quantization

Selective — Llama backbone and text head at 6-bit, audio_codebook_embeddings and audio_decoder_proj.audio_lm_head kept at bf16 to preserve voice character. Config in config.json → quantization block, automatically honored by HiggsAudioServer.from_pretrained.

License

Apache-2.0 (inherits from base model).

Downloads last month: -

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/higgs-audio-v2-3B-mlx-q6

Base model

bosonai/higgs-audio-v2-generation-3B-base

Finetuned

(2)

this model