Kelvinmbewe/mbert_LusakaLang_MultiTask

LusakaLang MultiTask Model

This model is a unified transformer architecture built on top of bert-base-multilingual-cased, designed to perform three tasks simultaneously:

Language Identification
Sentiment Analysis
Topic Classification

The system integrates three fine‑tuned LusakaLang checkpoints:

mbert_Lusaka_Language_Analysis
mbert_LusakaLang_Sentiment_Analysis
mbert_LusakaLang_Topic

All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead and promotes consistent, harmonized predictions across all tasks.

Why This Model Matters

Zambian communication is inherently multilingual, fluid, and deeply shaped by context. A single message may blend English, Bemba, Nyanja, local slang, and frequent code‑switching, often expressed through culturally grounded idioms and subtle emotional cues. This model is designed specifically for that environment, where meaning depends not only on the words used but on how languages interact within a single utterance.

It excels at identifying the dominant language or detecting when multiple languages are being used together, interpreting sentiment even when it is conveyed indirectly or through culturally specific phrasing, and classifying text into practical topics such as driver behaviour, payment issues, app performance, customer support, and ride availability. By capturing these nuances, the model provides a more accurate and context‑aware understanding of real Zambian communication.

How to Use This Model

from transformers import AutoTokenizer
import torch

class LusakaLangMultiTask:
    def __init__(self, path="Kelvinmbewe/LusakaLang-MultiTask"):
        self.tokenizer = AutoTokenizer.from_pretrained(path)
        self.model = torch.load(f"{path}/model.pt").eval()

    def predict_language(self, texts): pass
    def predict_sentiment(self, texts): pass
    def predict_topic(self, texts): pass

llm = LusakaLangMultiTask()

print(llm.predict_language([...]))
print(llm.predict_sentiment([...]))
print(llm.predict_topic([...]))

Sample Output

# Language Identification 🌍
[
  {"lang": "Bemba",  "conf": 0.96},
  {"lang": "Nyanja", "conf": 0.95},
  {"lang": "English","conf": 0.99}
]
# Sentiment ❤️
[
  {"sent": "Negative", "conf": 0.98},
  {"sent": "Positive", "conf": 0.95},
  {"sent": "Neutral",  "conf": 0.87}
]
# Topic 🗂️
[
  {"topic": "Payment Issue",     "conf": 0.97},
  {"topic": "Customer Support",  "conf": 0.95},
  {"topic": "Driver Behaviour",  "conf": 0.96}
]

=========================== Training Architecture ===========================

📥 Input                →  🧠 Core Engine              →            📈 Output
------------------------------------------------------------------------------------
Text (Any Language)     →   Tokenizer 🔤                       →     Language 🌍
                        →   Shared mBERT Encoder 🧠            →     Bemba / Nyanja /
                        →   CLS Vector 🎯                      →     English / Mixed
------------------------------------------------------------------------------------
User Feedback 💬        →   Tokenizer 🔤                       →     Sentiment ❤️
                        →   Shared Encoder 🧠                  →     Negative / Neutral /
                        →   CLS Vector 🎯                      →     Positive
------------------------------------------------------------------------------------
Ride Context 🚗         →   Tokenizer 🔤                       →     Topic 🗂️
                        →   Shared Encoder 🧠                  →     Driver / Payment /
                        →   CLS Vector 🎯                      →     Support / App / Availability
------------------------------------------------------------------------------------

Downloads last month: 48

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Kelvinmbewe/mbert_LusakaLang_MultiTask

Base model

google-bert/bert-base-multilingual-cased

Finetuned

Kelvinmbewe/mbert_Lusaka_Language_Analysis

Finetuned

Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis

Finetuned

(1)

this model

Evaluation results

accuracy on LusakaLang Language Data
test set self-reported

0.970
f1_macro on LusakaLang Language Data
test set self-reported

0.960
accuracy on LusakaLang Language Data
test set self-reported

0.932
f1_macro on LusakaLang Language Data
test set self-reported

0.922
f1_negative on LusakaLang Language Data
test set self-reported

0.865
f1_neutral on LusakaLang Language Data
test set self-reported

0.950
f1_positive on LusakaLang Language Data
test set self-reported

0.950
accuracy on LusakaLang Language Data
test set self-reported

0.910
f1_macro on LusakaLang Language Data
test set self-reported

0.900