Khmer Telephony ASR โ€” v9 allsynth-lsall (epoch 21) โ€” EXPERIMENTAL

โš ๏ธ Experimental checkpoint, not a release. This run intentionally trained on the full manually-annotated telephony set including the evaluation split (oversampled ร—15). The in-domain numbers below therefore overlap the training data and overstate accuracy. The held-out FLEURS read-speech CER is ~49% โ€” substantially worse than the production telephony release โ€” so do not use this model as a general Khmer ASR system. It is published here only for experiment tracking / inspection.

Streaming FastConformerโ€“RNNT (BPE) acoustic model for Khmer call-center / telephony speech (8 kHz transcoded for G.711, 16 kHz input). Fine-tuned on a large synthetic Khmer corpus combined with the full set of manually-annotated telephony utterances, heavily oversampled so the telephony domain dominates.

In-domain numbers (training-overlapping โ€” NOT held out)

Evaluated on the full ~7,610-utterance telephony set (eval split leaked into training, so these are an upper bound, not a generalization estimate).

Metric Value
CER (no-space) 1.24%
CER (raw, spaces counted) 1.39%
WER 10.34%
Per-utterance CER bucket: < 10% 7 311 (96.1%)
10โ€“30% 182 (2.4%)
30โ€“50% 46 (0.6%)
50โ€“80% 33 (0.4%)
โ‰ฅ 80% 38 (0.5%)

Held-out generalization (FLEURS km, read speech)

Metric Value
CER (no-space) 48.82%
WER ~100%

This confirms the model has overfit to the in-domain (partly memorized) data and generalizes poorly. The production release (manhp/asr-khmer-telephony) achieves ~20% FLEURS CER.

Model details

  • Architecture: FastConformer encoder (17 layers, d_model 512, 8 heads, causal downsampling, chunked-limited attention [[70,13],[70,6],[70,1],[70,0]])
    • RNN-T decoder + BPE/Unigram tokenizer (vocab 2 048).
  • Fine-tuning data: large synthetic Khmer corpus (322k) + full LS telephony set including eval, oversampled ร—15 (26% epoch share).
  • Epoch: 21 (val_wer 0.0458).

Files

Path What it is
model.nemo / epoch21.nemo NeMo bundle. Load via EncDecRNNTBPEModel.restore_from(...).
checkpoints/*.ckpt Lightning checkpoint (~1.4 GB) for resume / surgery.
conf/khmer_finetune_allsynth_lsall.yaml Training config.
tokenizer/* SentencePiece tokenizer (also embedded in .nemo).

License

Apache-2.0 (model weights). Training data is internal.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support