--- library_name: transformers license: apache-2.0 language: - en tags: - text-classification - content-moderation - toxicity - spam-detection - onnx - bert - minilm datasets: - thesofakillers/jigsaw-toxic-comment-classification-challenge - google/civil_comments - AbdulHadi806/mail_spam_ham_dataset - SetFit/enron_spam pipeline_tag: text-classification model-index: - name: minilm-content-guard-3class results: - task: type: text-classification name: Content Moderation metrics: - name: Accuracy type: accuracy value: 0.96 - name: F1 (macro) type: f1 value: 0.96 - name: F1 (weighted) type: f1 value: 0.96 base_model: - nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large --- # MiniLM Content Guard - 3 Class A lightweight content moderation model that classifies text into **safe**, **toxic**, or **spam**. Built on [MiniLMv2-L6-H384](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large) and fine-tuned with focal loss for robust handling of hard examples. This model is in ONNX format and optimized for CPU inference. #### suggested threshold for considering results as valid: 0.9 (if less than 0.9 confidence, there can be a risk that prediction is wrong) ## Labels | Label ID | Label | Description | | -------- | ------- | ------------------------------------------------------ | | 0 | `safe` | Normal, non-harmful content | | 1 | `toxic` | Hate speech, threats, personal attacks, severe insults | | 2 | `spam` | Unsolicited promotions, scams, phishing attempts | ## Usage ### ONNX Runtime ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer, AutoConfig import torch model_name = "navodPeiris/minilm-toxic-spam-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) config = AutoConfig.from_pretrained(model_name) model = ORTModelForSequenceClassification.from_pretrained(model_name) text = "look like garbage!" inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) # Convert logits → probabilities probs = torch.softmax(outputs.logits, dim=-1) # Get predicted class pred_id = torch.argmax(probs, dim=-1).item() label = config.id2label[pred_id] confidence = probs[0][pred_id].item() print(label, f"{confidence}") ``` ### Transformers.js ```javascript import { pipeline } from "@huggingface/transformers"; const pipe = await pipeline( "text-classification", "navodPeiris/minilm-toxic-spam-classifier", ); const res = await pipe("this code is uglier than u ugghh"); console.log("res:", res); ``` ## Performance Evaluated on a held-out test set of 13,123 samples: ``` precision recall f1-score support safe 0.98 0.96 0.97 4332 toxic 0.90 0.96 0.93 1626 spam 0.99 0.97 0.98 1157 accuracy 0.96 7115 macro avg 0.95 0.96 0.96 7115 weighted avg 0.96 0.96 0.96 7115 ``` ## Training Details ### Architecture - **Base model**: [MiniLMv2-L6-H384-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large) (6 layers, 384 hidden dim, ~22M params) - **Task head**: Linear classification head (3 classes) - **Max sequence length**: 512 tokens ### Training Data The model was trained on a combined dataset from multiple sources: | Source | Type | Usage | | --------------------------------------------------------------------------------------------------------------------- | -------- | ------------------- | | [Jigsaw Toxic Comments](https://huggingface.co/datasets/thesofakillers/jigsaw-toxic-comment-classification-challenge) | Toxicity | safe / toxic labels | | [Civil Comments](https://huggingface.co/datasets/google/civil_comments) | Toxicity | safe / toxic labels | | [Mail Spam/Ham](https://huggingface.co/datasets/AbdulHadi806/mail_spam_ham_dataset) | Spam | spam labels | | [Enron Spam](https://huggingface.co/datasets/SetFit/enron_spam) | Spam | spam labels | ### Hyperparameters - **Epochs**: 5 - **Batch size**: 16 (train) / 32 (eval) - **Learning rate**: 3e-5 - **Weight decay**: 0.01 - **Loss**: Focal loss (gamma=2) for better handling of hard/borderline examples - **Early stopping**: Enabled on F1 macro ### ONNX Export An ONNX version of the model is included for fast CPU inference: - **Opset version**: 18 - **Dynamic axes**: batch size and sequence length - **Constant folding**: Enabled ## Limitations - English-only — not tested on other languages - May struggle with subtle or implicit toxicity where the language closely resembles negative sentiment (e.g., strong product complaints vs. personal attacks) - Not designed for nuanced content policy enforcement — best used as a first-pass filter ## License Apache 2.0