view article Article CircleGuardBench: New Standard for Evaluating AI Moderation Models May 7, 2025 • 59
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5, 2025 • 232
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 92
A Long Way to Go: Investigating Length Correlations in RLHF Paper • 2310.03716 • Published Oct 5, 2023 • 10