Models

90

Full-text search

Active filters: jailbreak-detection

fastino/gliguard-LLMGuardrails-300M

Text Classification • 0.2B • Updated 1 day ago • 1.55k • 45

rogue-security/prompt-injection-jailbreak-sentinel-v2

Text Classification • 0.6B • Updated Mar 11 • 19.3k • 36

Necent/distilbert-base-uncased-detected-jailbreak

Text Classification • 67M • Updated May 29, 2025 • 31

madhurjindal/Jailbreak-Detector

Text Classification • 65.8M • Updated May 30, 2025 • 2.02k

madhurjindal/Jailbreak-Detector-Large

Text Classification • 0.3B • Updated May 30, 2025 • 209 • 3

GuardrailsAI/prompt-saturation-attack-detector

Text Classification • 4.39M • Updated Nov 14, 2024 • 61.8k • • 2

qualifire/prompt-injection-sentinel

Text Classification • 0.4B • Updated Sep 22, 2025 • 2.14k • 15

madhurjindal/Jailbreak-Detector-2-XL

Text Generation • Updated Jul 20, 2025 • 645 • 6

gincioks/cerberus-bert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 6

gincioks/cerberus-distilbert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 6

gincioks/cerberus-deberta-v3-small-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 5

gincioks/cerberus-proventra-mdeberta-v3-base-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 7

pmking27/jailbreak-detection

Text Classification • 0.3B • Updated Jun 19, 2025 • 193

intelliway/deberta-v3-base-prompt-injection-v2-mapa

Text Classification • 0.2B • Updated Jul 3, 2025 • 5

qualifire/prompt-injection-jailbreak-sentinel-v2-GGUF

0.6B • Updated Sep 28, 2025 • 22 • 1

ahmedmajid92/iraqi-guard-model

Text Classification • 0.3B • Updated Oct 9, 2025 • 6 • 1

rootfs/tool-call-verifier

Token Classification • 0.1B • Updated Dec 14, 2025 • 9

rootfs/function-call-sentinel

Text Classification • 0.1B • Updated Dec 14, 2025 • 12

vincentoh/jailbreak-detector-v5

Text Classification • Updated Dec 18, 2025

thirtyninetythree/deberta-prompt-guard

Text Classification • 0.2B • Updated Dec 22, 2025 • 4

llm-semantic-router/toolcall-verifier

Token Classification • 0.1B • Updated Dec 18, 2025 • 65 • 2

llm-semantic-router/toolcall-sentinel

Text Classification • 0.1B • Updated Dec 18, 2025 • 35 • 1

llm-semantic-router/mmbert-jailbreak-detector-lora

Text Classification • Updated Jan 21 • 3

llm-semantic-router/mmbert-jailbreak-detector-merged

Text Classification • 0.3B • Updated Jan 21 • 60

abdulmunimjemal/Sentinel-Rail-A-Prompt-Attack-Guard

Text Classification • Updated Jan 21 • 1

llm-semantic-router/mmbert-safety-classifier-level1

Text Classification • Updated Jan 21 • 1

llm-semantic-router/mlcommons-safety-classifier-level1-binary

Text Classification • Updated Jan 22 • 70

ynyg/Unified_Prompt_Guard

0.3B • Updated Jan 28 • 28

llm-semantic-router/mmbert32k-jailbreak-detector-lora

Text Classification • Updated Feb 1 • 8

llm-semantic-router/mmbert32k-jailbreak-detector-merged

Text Classification • 0.3B • Updated Mar 6 • 2.01k