Indic Parler TTS — Bhojpuri LoRA (MLX)

First MLX implementation of Parler TTS, fine-tuned for Bhojpuri language on Apple Silicon.

This is a LoRA adapter on top of ai4bharat/indic-parler-tts.
The base model supports 18 Indian languages — this adapter teaches it Bhojpuri phoneme patterns.

Highlights

  • Runs natively on Apple Silicon (M1/M2/M3/M4) via MLX
  • First Parler TTS port to MLX — no PyTorch required
  • Fine-tuned on the IISc SYSPIN Bhojpuri corpus (CC-BY-4.0)
  • LoRA rank-8 adapter — only 4.3M trainable params out of 920M total (0.47%)
  • All 69 base model speakers can now speak Bhojpuri

Model Details

Property Value
Base model ai4bharat/indic-parler-tts
Training data IISc SYSPIN Bhojpuri Female (5,537 clips)
LoRA rank 8, alpha 16
LoRA targets Decoder self-attn, cross-attn Q/V, FFN
Trained on MacBook Pro (Apple Silicon)
Framework MLX
Training steps ~1,400 (2 epochs)

Audio Samples

"ई बहुत नीमन बा। हम कल जाइब।" — Bhojpuri (short)

Base model Fine-tuned
Divya
Generic

"रउरा के राम राम। आज हम एही गाँव में रहीला। का हाल बा रउरा के?" — Bhojpuri (long)

Base model Fine-tuned
Divya
Generic

"नमस्ते, आप कैसे हैं? मैं बहुत अच्छा हूँ।" — Hindi sanity check (should sound similar on both)

Base model Fine-tuned
Divya
Generic

Usage

pip install mlx mlx-lm soundfile
import sys
sys.path.insert(0, "path/to/mlx-audio-train")

from models.indic_parler_tts.generate import load_model, generate
from train.lora import apply_lora, load_adapters, LoRAConfig
import soundfile as sf
import mlx.core as mx
from huggingface_hub import snapshot_download

# 1. Load base model
model, tokenizers = load_model("ai4bharat/indic-parler-tts")

# 2. Apply LoRA and load adapter
adapter_dir = snapshot_download("akashicmarga/indic-parler-tts-bhojpuri-lora")
lora_config = LoRAConfig(
    rank=8, alpha=16.0, dropout=0.0,
    target_modules=[
        "decoder.layers.*.self_attn.q", "decoder.layers.*.self_attn.k",
        "decoder.layers.*.self_attn.v", "decoder.layers.*.self_attn.out",
        "decoder.layers.*.cross_attn.q", "decoder.layers.*.cross_attn.v",
        "decoder.layers.*.fc1", "decoder.layers.*.fc2",
    ],
    model_type="indic_parler_tts",
)
apply_lora(model, lora_config)
load_adapters(model, f"{adapter_dir}/adapters.safetensors")
mx.eval(model.parameters())

# 3. Generate Bhojpuri speech
audio = generate(
    model, tokenizers,
    description="A female speaker delivers speech at a moderate pace. The recording is of very high quality.",
    text="रउरा के राम राम। आज हम एही गाँव में रहीला।",
)
sf.write("bhojpuri.wav", audio, 44100)

Any speaker description supported by the base model works:

# Divya speaking Bhojpuri
audio = generate(
    model, tokenizers,
    description="Divya's voice is slightly expressive and very animated. She speaks at a moderate pace.",
    text="ई बहुत नीमन बा। हम कल जाइब।",
)

Training

Trained using mlx-audio-train on Apple Silicon.

python scripts/train.py --config configs/indic_parler_bhojpuri.yaml

Dataset

IISc SYSPIN Corpus — Bhojpuri Female Speaker
License: CC-BY-4.0

License

Adapter weights: Apache 2.0
Base model: Apache 2.0 (ai4bharat/indic-parler-tts)
Training data: CC-BY-4.0 (IISc SYSPIN)

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akashicmarga/indic-parler-tts-bhojpuri-lora

Adapter
(1)
this model