---
library_name: transformers
license: apache-2.0
language:
  - en
tags:
  - text-classification
  - content-moderation
  - toxicity
  - spam-detection
  - onnx
  - bert
  - minilm
datasets:
  - thesofakillers/jigsaw-toxic-comment-classification-challenge
  - google/civil_comments
  - AbdulHadi806/mail_spam_ham_dataset
  - SetFit/enron_spam
pipeline_tag: text-classification
model-index:
  - name: minilm-content-guard-3class
    results:
      - task:
          type: text-classification
          name: Content Moderation
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.96
          - name: F1 (macro)
            type: f1
            value: 0.96
          - name: F1 (weighted)
            type: f1
            value: 0.96
base_model:
  - nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large
---

# MiniLM Content Guard - 3 Class

A lightweight content moderation model that classifies text into **safe**, **toxic**, or **spam**. Built on [MiniLMv2-L6-H384](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large) and fine-tuned with focal loss for robust handling of hard examples.
This model is in ONNX format and optimized for CPU inference.

#### suggested threshold for considering results as valid: 0.9 (if less than 0.9 confidence, there can be a risk that prediction is wrong)

## Labels

| Label ID | Label   | Description                                            |
| -------- | ------- | ------------------------------------------------------ |
| 0        | `safe`  | Normal, non-harmful content                            |
| 1        | `toxic` | Hate speech, threats, personal attacks, severe insults |
| 2        | `spam`  | Unsolicited promotions, scams, phishing attempts       |

## Usage

### ONNX Runtime

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, AutoConfig
import torch

model_name = "navodPeiris/minilm-toxic-spam-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
model = ORTModelForSequenceClassification.from_pretrained(model_name)

text = "look like garbage!"

inputs = tokenizer(text, return_tensors="pt")

outputs = model(**inputs)

# Convert logits → probabilities
probs = torch.softmax(outputs.logits, dim=-1)

# Get predicted class
pred_id = torch.argmax(probs, dim=-1).item()

label = config.id2label[pred_id]
confidence = probs[0][pred_id].item()

print(label, f"{confidence}")
```

### Transformers.js

```javascript
import { pipeline } from "@huggingface/transformers";

const pipe = await pipeline(
  "text-classification",
  "navodPeiris/minilm-toxic-spam-classifier",
);

const res = await pipe("this code is uglier than u ugghh");
console.log("res:", res);
```

## Performance

Evaluated on a held-out test set of 13,123 samples:

```
            precision  recall   f1-score   support

safe          0.98      0.96      0.97      4332
toxic         0.90      0.96      0.93      1626
spam          0.99      0.97      0.98      1157

accuracy                          0.96      7115
macro avg     0.95      0.96      0.96      7115
weighted avg  0.96      0.96      0.96      7115
```

## Training Details

### Architecture

- **Base model**: [MiniLMv2-L6-H384-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large) (6 layers, 384 hidden dim, ~22M params)
- **Task head**: Linear classification head (3 classes)
- **Max sequence length**: 512 tokens

### Training Data

The model was trained on a combined dataset from multiple sources:

| Source                                                                                                                | Type     | Usage               |
| --------------------------------------------------------------------------------------------------------------------- | -------- | ------------------- |
| [Jigsaw Toxic Comments](https://huggingface.co/datasets/thesofakillers/jigsaw-toxic-comment-classification-challenge) | Toxicity | safe / toxic labels |
| [Civil Comments](https://huggingface.co/datasets/google/civil_comments)                                               | Toxicity | safe / toxic labels |
| [Mail Spam/Ham](https://huggingface.co/datasets/AbdulHadi806/mail_spam_ham_dataset)                                   | Spam     | spam labels         |
| [Enron Spam](https://huggingface.co/datasets/SetFit/enron_spam)                                                       | Spam     | spam labels         |

### Hyperparameters

- **Epochs**: 5
- **Batch size**: 16 (train) / 32 (eval)
- **Learning rate**: 3e-5
- **Weight decay**: 0.01
- **Loss**: Focal loss (gamma=2) for better handling of hard/borderline examples
- **Early stopping**: Enabled on F1 macro

### ONNX Export

An ONNX version of the model is included for fast CPU inference:

- **Opset version**: 18
- **Dynamic axes**: batch size and sequence length
- **Constant folding**: Enabled

## Limitations

- English-only — not tested on other languages
- May struggle with subtle or implicit toxicity where the language closely resembles negative sentiment (e.g., strong product complaints vs. personal attacks)
- Not designed for nuanced content policy enforcement — best used as a first-pass filter

## License

Apache 2.0