Automatic Speech Recognition
NeMo
Vietnamese
English
speech
parakeet
vietnamese
transducer
FastConformer
TDT
Eval Results (legacy)
Instructions to use nmcuong/parakeet-tdt-0.6b-v3-vietnamese with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nmcuong/parakeet-tdt-0.6b-v3-vietnamese with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nmcuong/parakeet-tdt-0.6b-v3-vietnamese") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Parakeet-TDT-0.6B Vietnamese
Fine-tuned NVIDIA Parakeet-TDT-0.6B-v3 for Vietnamese and English automatic speech recognition.
Model Description
- Architecture: FastConformer encoder + Token-and-Duration Transducer (TDT) decoder
- Parameters: 0.6B
- Tokenizer: 8,192-token SentencePiece BPE
- Base model:
nvidia/parakeet-tdt-0.6b-v3 - Languages: Vietnamese (primary), English
- Training data: 8,750h Vietnamese + 1,530h English (~10,280h total)
Evaluation Results
| Test Set | WER |
|---|---|
| Validation set | 5.68% |
Usage
import nemo.collections.asr as nemo_asr
# Load model
model = nemo_asr.models.ASRModel.from_pretrained("nmcuong/parakeet-tdt-0.6b-v3-vietnamese")
# Transcribe
transcriptions = model.transcribe(["audio.wav"])
print(transcriptions[0].text)
Training Details
- Base model:
nvidia/parakeet-tdt-0.6b-v3 - Learning rate: 5e-5
- Batch size: 48 (gradient accumulation: 2, effective batch size: 96)
- Training steps: 130,000
- GPUs: 2× H200
- Precision: bf16-mixed
- Audio duration filter: min 0.1s, max 30s
- Max WER filter: 12.5%
Training Data
| Dataset | Duration |
|---|---|
| Vietnamese | 8,750h |
| English | 1,530h |
| Total | ~10,280h |
Acknowledgments
- Base model: NVIDIA Parakeet-TDT-0.6B-v3
- Downloads last month
- 7
Model tree for nmcuong/parakeet-tdt-0.6b-v3-vietnamese
Base model
nvidia/parakeet-tdt-0.6b-v3Evaluation results
- WER (validation) on Vietnamese + English validation setvalidation set self-reported5.680