Instructions to use manhp/asr-khmer-allsynth-lsall-exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use manhp/asr-khmer-allsynth-lsall-exp with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("manhp/asr-khmer-allsynth-lsall-exp") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Khmer Telephony ASR โ v9 allsynth-lsall (epoch 21) โ EXPERIMENTAL
โ ๏ธ Experimental checkpoint, not a release. This run intentionally trained on the full manually-annotated telephony set including the evaluation split (oversampled ร15). The in-domain numbers below therefore overlap the training data and overstate accuracy. The held-out FLEURS read-speech CER is ~49% โ substantially worse than the production telephony release โ so do not use this model as a general Khmer ASR system. It is published here only for experiment tracking / inspection.
Streaming FastConformerโRNNT (BPE) acoustic model for Khmer call-center / telephony speech (8 kHz transcoded for G.711, 16 kHz input). Fine-tuned on a large synthetic Khmer corpus combined with the full set of manually-annotated telephony utterances, heavily oversampled so the telephony domain dominates.
In-domain numbers (training-overlapping โ NOT held out)
Evaluated on the full ~7,610-utterance telephony set (eval split leaked into training, so these are an upper bound, not a generalization estimate).
| Metric | Value |
|---|---|
| CER (no-space) | 1.24% |
| CER (raw, spaces counted) | 1.39% |
| WER | 10.34% |
| Per-utterance CER bucket: < 10% | 7 311 (96.1%) |
| 10โ30% | 182 (2.4%) |
| 30โ50% | 46 (0.6%) |
| 50โ80% | 33 (0.4%) |
| โฅ 80% | 38 (0.5%) |
Held-out generalization (FLEURS km, read speech)
| Metric | Value |
|---|---|
| CER (no-space) | 48.82% |
| WER | ~100% |
This confirms the model has overfit to the in-domain (partly memorized) data and
generalizes poorly. The production release (manhp/asr-khmer-telephony) achieves
~20% FLEURS CER.
Model details
- Architecture: FastConformer encoder (17 layers, d_model 512, 8 heads,
causal downsampling, chunked-limited attention
[[70,13],[70,6],[70,1],[70,0]])- RNN-T decoder + BPE/Unigram tokenizer (vocab 2 048).
- Fine-tuning data: large synthetic Khmer corpus (
322k) + full LS telephony set including eval, oversampled ร15 (26% epoch share). - Epoch: 21 (val_wer 0.0458).
Files
| Path | What it is |
|---|---|
model.nemo / epoch21.nemo |
NeMo bundle. Load via EncDecRNNTBPEModel.restore_from(...). |
checkpoints/*.ckpt |
Lightning checkpoint (~1.4 GB) for resume / surgery. |
conf/khmer_finetune_allsynth_lsall.yaml |
Training config. |
tokenizer/* |
SentencePiece tokenizer (also embedded in .nemo). |
License
Apache-2.0 (model weights). Training data is internal.
- Downloads last month
- 8