BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives Paper • 2511.08029 • Published Nov 11, 2025 • 3
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2, 2025
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models Paper • 2411.10867 • Published Nov 16, 2024 • 9
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5, 2024 • 33
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published Jun 6, 2024 • 8
Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting Paper • 2302.05049 • Published Feb 10, 2023
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Rethinking Machine Unlearning for Large Language Models Paper • 2402.08787 • Published Feb 13, 2024 • 3
Scaling Laws for Downstream Task Performance of Large Language Models Paper • 2402.04177 • Published Feb 6, 2024 • 20
Are Emergent Abilities of Large Language Models a Mirage? Paper • 2304.15004 • Published Apr 28, 2023 • 8
Representation Engineering: A Top-Down Approach to AI Transparency Paper • 2310.01405 • Published Oct 2, 2023 • 7
Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions Paper • 2306.01799 • Published Jun 1, 2023 • 1
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Transforming and Combining Rewards for Aligning Large Language Models Paper • 2402.00742 • Published Feb 1, 2024 • 12