MiniLM Content Guard - 3 Class

A lightweight content moderation model that classifies text into safe, toxic, or spam. Built on MiniLMv2-L6-H384 and fine-tuned with focal loss for robust handling of hard examples. This model is in ONNX format and optimized for CPU inference.

suggested threshold for considering results as valid: 0.9 (if less than 0.9 confidence, there can be a risk that prediction is wrong)

Labels

Label ID Label Description
0 safe Normal, non-harmful content
1 toxic Hate speech, threats, personal attacks, severe insults
2 spam Unsolicited promotions, scams, phishing attempts

Usage

ONNX Runtime

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import torch

model_name = "navodPeiris/minilm-toxic-spam-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
model = ORTModelForSequenceClassification.from_pretrained(model_name)

text = "look like garbage!"

inputs = tokenizer(text, return_tensors="pt")

outputs = model(**inputs)

# Convert logits → probabilities
probs = torch.softmax(outputs.logits, dim=-1)

# Get predicted class
pred_id = torch.argmax(probs, dim=-1).item()

label = config.id2label[pred_id]
confidence = probs[0][pred_id].item()

print(label, f"{confidence}")

Transformers.js

import { pipeline } from "@huggingface/transformers";

const pipe = await pipeline(
  "text-classification",
  "navodPeiris/minilm-toxic-spam-classifier",
);

const res = await pipe("this code is uglier than u ugghh");
console.log("res:", res);

Performance

Evaluated on a held-out test set of 13,123 samples:

            precision  recall   f1-score   support

safe          0.98      0.96      0.97      4332
toxic         0.90      0.96      0.93      1626
spam          0.99      0.97      0.98      1157

accuracy                          0.96      7115
macro avg     0.95      0.96      0.96      7115
weighted avg  0.96      0.96      0.96      7115

Training Details

Architecture

Training Data

The model was trained on a combined dataset from multiple sources:

Source Type Usage
Jigsaw Toxic Comments Toxicity safe / toxic labels
Civil Comments Toxicity safe / toxic labels
Mail Spam/Ham Spam spam labels
Enron Spam Spam spam labels

Hyperparameters

  • Epochs: 5
  • Batch size: 16 (train) / 32 (eval)
  • Learning rate: 3e-5
  • Weight decay: 0.01
  • Loss: Focal loss (gamma=2) for better handling of hard/borderline examples
  • Early stopping: Enabled on F1 macro

ONNX Export

An ONNX version of the model is included for fast CPU inference:

  • Opset version: 18
  • Dynamic axes: batch size and sequence length
  • Constant folding: Enabled

Limitations

  • English-only — not tested on other languages
  • May struggle with subtle or implicit toxicity where the language closely resembles negative sentiment (e.g., strong product complaints vs. personal attacks)
  • Not designed for nuanced content policy enforcement — best used as a first-pass filter

License

Apache 2.0

Downloads last month
477
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for navodPeiris/minilm-toxic-spam-classifier

Quantized
(4)
this model

Datasets used to train navodPeiris/minilm-toxic-spam-classifier

Evaluation results