Khmer Telephony ASR — v9 allsynth-lsall (epoch 21) — EXPERIMENTAL

⚠️ Experimental checkpoint, not a release. This run intentionally trained on the full manually-annotated telephony set including the evaluation split (oversampled ×15). The in-domain numbers below therefore overlap the training data and overstate accuracy. The held-out FLEURS read-speech CER is ~49% — substantially worse than the production telephony release — so do not use this model as a general Khmer ASR system. It is published here only for experiment tracking / inspection.

Streaming FastConformer–RNNT (BPE) acoustic model for Khmer call-center / telephony speech (8 kHz transcoded for G.711, 16 kHz input). Fine-tuned on a large synthetic Khmer corpus combined with the full set of manually-annotated telephony utterances, heavily oversampled so the telephony domain dominates.

In-domain numbers (training-overlapping — NOT held out)

Evaluated on the full ~7,610-utterance telephony set (eval split leaked into training, so these are an upper bound, not a generalization estimate).

Metric	Value
CER (no-space)	1.24%
CER (raw, spaces counted)	1.39%
WER	10.34%
Per-utterance CER bucket: < 10%	7 311 (96.1%)
10–30%	182 (2.4%)
30–50%	46 (0.6%)
50–80%	33 (0.4%)
≥ 80%	38 (0.5%)

Held-out generalization (FLEURS km, read speech)

Metric	Value
CER (no-space)	48.82%
WER	~100%

This confirms the model has overfit to the in-domain (partly memorized) data and generalizes poorly. The production release (manhp/asr-khmer-telephony) achieves ~20% FLEURS CER.

Model details

Architecture: FastConformer encoder (17 layers, d_model 512, 8 heads, causal downsampling, chunked-limited attention [[70,13],[70,6],[70,1],[70,0]])
- RNN-T decoder + BPE/Unigram tokenizer (vocab 2 048).
Fine-tuning data: large synthetic Khmer corpus (~~322k) + full LS telephony set including eval, oversampled ×15 (~~26% epoch share).
Epoch: 21 (val_wer 0.0458).

Files

Path	What it is
`model.nemo` / `epoch21.nemo`	NeMo bundle. Load via `EncDecRNNTBPEModel.restore_from(...)`.
`checkpoints/*.ckpt`	Lightning checkpoint (~1.4 GB) for resume / surgery.
`conf/khmer_finetune_allsynth_lsall.yaml`	Training config.
`tokenizer/*`	SentencePiece tokenizer (also embedded in `.nemo`).

License

Apache-2.0 (model weights). Training data is internal.

Downloads last month: 8