🤖 distilbert-hindi-eou-detector

A fine-tuned DistilBERT model for End-of-Utterance (EOU) Detection in conversational Hindi. This model identifies whether a Hindi dialogue phrase marks the end of a speaker's turn, making it suitable for voice assistants, dialogue systems, or turn-taking logic in chatbots.

🧠 Model Description

  • Base model: distilbert-base-multilingual-cased
  • Language: Hindi
  • Task: Binary Classification — End of Utterance Detection
  • Labels:
    • 1: End of Utterance (EOU)
    • 0: Not End of Utterance (NOT_EOU)

🗂️ Training Dataset

This model was fine-tuned on the hindi-conversational-eou dataset — a balanced collection of 1000 Hindi conversational phrases labeled for end-of-turn detection.

Each example in the dataset is a short Hindi phrase labeled with:

  • "text": The utterance string
  • "label": 0 or 1 (as defined above)

📊 Evaluation Metrics

(Note: These are example metrics — replace with your actual numbers if available)

  • Accuracy: 92.4%
  • F1 Score: 91.7%
  • Precision: 90.5%
  • Recall: 93.0%

🛠️ Intended Use

This model is ideal for:

  • Voice assistants (to detect pauses vs. final utterances)
  • Dialogue systems and conversational AI
  • Research in Hindi language conversation modeling

🧪 Example Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="yashsoni78/distilbert-hindi-eou-detector")

# Example phrases
examples = [
    "क्या तुम मेरे साथ चलोगे?",
    "अगर हम वहाँ जाते तो",
]

for text in examples:
    result = classifier(text)
    print(f"{text} => {result}")

🔍 Limitations
  - Trained on a small dataset (1000 examples); may not generalize to complex or domain-specific Hindi.
  - Only binary EOU detection, no deeper semantic understanding.
  - Assumes input is in colloquial conversational Hindi.

🧾 Citation
If you use this model in your research or application, please cite:

@misc{distilbert_hindi_eou_2025,
  title = {distilbert-hindi-eou-detector},
  author = {Yash Soni},
  year = {2025},
  howpublished = {\url{https://huggingface.co/yashsoni78/distilbert-hindi-eou-detector}},
  note = {Fine-tuned model for Hindi end-of-utterance detection}
}

📄 License
This model is released under the MIT License. You are free to use, modify, and distribute with attribution.

🙏 Acknowledgements
  - Base model: distilbert-base-multilingual-cased
  - Dataset: hindi-end-of-utterance-detection
  - Created with the help of 🤗 Transformers
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yashsoni78/distilbert-hindi-eou-detector

Finetuned
(10515)
this model

Dataset used to train yashsoni78/distilbert-hindi-eou-detector