Can Large Language Models Capture Human Annotator Disagreements? Paper • 2506.19467 • Published Jun 24, 2025 • 18
LEXam: Benchmarking Legal Reasoning on 340 Law Exams Paper • 2505.12864 • Published May 19, 2025 • 3