COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 12 days ago • 7
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models Paper • 2601.06165 • Published 10 days ago • 15
AIM-Intelligence/COMPASS_Qwen2.5-7B-Instruct_LoRA Text Generation • 8B • Updated 11 days ago • 19 • 2
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 11 days ago • 5.92k • 165 • 10
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates Paper • 2509.08729 • Published Sep 10, 2025 • 1
ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks Paper • 2508.16889 • Published Aug 23, 2025 • 2
One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs Paper • 2503.04856 • Published Mar 6, 2025 • 2
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety Paper • 2502.04757 • Published Feb 7, 2025 • 2
One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs Paper • 2503.04856 • Published Mar 6, 2025 • 2
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety Paper • 2502.04757 • Published Feb 7, 2025 • 2
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs Paper • 2508.03365 • Published Aug 5, 2025 • 4
Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models Paper • 2508.04196 • Published Aug 6, 2025 • 1
ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks Paper • 2508.16889 • Published Aug 23, 2025 • 2
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates Paper • 2509.08729 • Published Sep 10, 2025 • 1