COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 4 days ago • 6
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 3 days ago • 4
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition Paper • 2505.15367 • Published May 21, 2025 • 2
PIQA: Reasoning about Physical Commonsense in Natural Language Paper • 1911.11641 • Published Nov 26, 2019 • 5
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 26
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs Paper • 2508.03365 • Published Aug 5, 2025 • 4
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published Feb 24, 2025 • 26