AI & ML interests
Building breatkthrough AI to solve the world's biggest problems.
Recent Activity
View all activity
Papers
EMO: Pretraining Mixture of Experts for Emergent Modularity
MolmoAct2: Action Reasoning Models for Real-world Deployment
Articles
Collection of the evaluation rollouts for MolmoAct2 conducted by Cortex AI
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/eval_molmoact_candy_sorting_in-distribution
Viewer β’ Updated β’ 59.6k β’ 457 -
allenai/eval_molmoact_cup_stacking_in-distribution
Viewer β’ Updated β’ 32k β’ 428 -
allenai/eval_molmoact_cup_storing_in-distribution
Viewer β’ Updated β’ 45.4k β’ 465
Collection of the MolmoAct2-BimanualYAM Dataset
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM-Dataset
Viewer β’ Updated β’ 76M β’ 7.69k β’ 1 -
Lerobot Visualizer V3
π’4Visualize LeRobot datasets with interactive charts
-
Dataset Stats
π1Fetch and view stats for MolmoAct2 datasets
Collection of the fine-tuned models for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM
Robotics β’ 5B β’ Updated β’ 2.26k β’ 4 -
allenai/MolmoAct2-SO100_101
Robotics β’ 5B β’ Updated β’ 1.39k β’ 14 -
allenai/MolmoAct2-DROID
Robotics β’ 5B β’ Updated β’ 753 β’ 4
Artifacts for Branch-Adapt-Route
This is the collection of all datasets in MolmoWebMix.
Data used in the MolmoPoint models
Models collection for MolmoBot release
-
allenai/Dolci-Think-SFT-Olmo-Hybrid
Viewer β’ Updated β’ 2.93M β’ 975 β’ 13 -
allenai/Dolci-Think-SFT-Olmo-Hybrid-Tool-Use-SA
Viewer β’ Updated β’ 1.6k β’ 92 β’ 13 -
allenai/Olmo-Hybrid-Think-SFT-7B
Text Generation β’ 7B β’ Updated β’ 2.07k β’ 19 -
allenai/Olmo-Hybrid-Instruct-SFT-7B
Text Generation β’ 7B β’ Updated β’ 2.8k β’ 17
Artifacts for the Molmo2 release
-
allenai/Molmo2-4B
Image-Text-to-Text β’ 5B β’ Updated β’ 18.8k β’ 51 -
allenai/Molmo2-8B
Image-Text-to-Text β’ 9B β’ Updated β’ 514k β’ 187 -
allenai/Molmo2-O-7B
Image-Text-to-Text β’ 8B β’ Updated β’ 103k β’ 26 -
allenai/Molmo2-VideoPoint-4B
Video-Text-to-Text β’ 5B β’ Updated β’ 347 β’ 21
Artifacts for the Molmo2 data release
Smart Any-Horizon Agent for Long Video Reasoning
-
allenai/SAGE-MM-Qwen3-VL-8B-SFT_RL
Video-Text-to-Text β’ 9B β’ Updated β’ 348 β’ 5 -
allenai/SAGE-MM-Molmo2-8B-SFT_RL
Video-Text-to-Text β’ 9B β’ Updated β’ 7 β’ 5 -
allenai/SAGE-MM-Qwen3-VL-4B-SFT_RL
Video-Text-to-Text β’ 5B β’ Updated β’ 33 β’ 6 -
allenai/SAGE-MM-Qwen2.5-VL-7B-SFT_RL
Video-Text-to-Text β’ 8B β’ Updated β’ 35 β’ 2
All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them.
-
allenai/Olmo-3-7B-Think-SFT
Text Generation β’ 7B β’ Updated β’ 6k β’ 10 -
allenai/Dolci-Think-SFT-7B
Viewer β’ Updated β’ 2.27M β’ 2.27k β’ 15 -
allenai/Olmo-3-7B-Think-DPO
Text Generation β’ 528k β’ Updated β’ 5.94k β’ 7 -
allenai/Dolci-Think-DPO-7B
Viewer β’ Updated β’ 150k β’ 230 β’ 11
All models for the MolmoAct (Multimodal Open Language Model for Action) release.
-
MolmoAct: Action Reasoning Models that can Reason in Space
Paper β’ 2508.07917 β’ Published β’ 45 -
allenai/MolmoAct-7B-D-0812
Robotics β’ 8B β’ Updated β’ 292 β’ 53 -
allenai/MolmoAct-7B-O-0812
Robotics β’ 8B β’ Updated β’ 26 β’ 5 -
allenai/MolmoAct-7B-D-Pretrain-0812
Robotics β’ 8B β’ Updated β’ 47 β’ 8
Datasets for IFBench benchmark and paper!
Artifacts for the OLMo 2 release.
-
allenai/OLMo-2-0425-1B-Instruct
Text Generation β’ 1B β’ Updated β’ 28.8k β’ 57 -
allenai/OLMo-2-0425-1B-Instruct-GGUF
1B β’ Updated β’ 208 β’ 14 -
allenai/OLMo-2-0425-1B
Text Generation β’ 1B β’ Updated β’ 235k β’ 79 -
allenai/OLMo-2-0325-32B-Instruct
Text Generation β’ 32B β’ Updated β’ 5.43k β’ 148
A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale.
A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog
All datasets released with Tulu 3 -- state of the art open post-training recipes.
-
allenai/tulu-3-sft-mixture
Viewer β’ Updated β’ 939k β’ 20.7k β’ 251 -
allenai/llama-3.1-tulu-3-8b-preference-mixture
Viewer β’ Updated β’ 273k β’ 5.28k β’ 26 -
allenai/llama-3.1-tulu-3-70b-preference-mixture
Viewer β’ Updated β’ 337k β’ 100 β’ 19 -
allenai/llama-3.1-tulu-3-405b-preference-mixture
Viewer β’ Updated β’ 361k β’ 88 β’ 6
Artifacts for open mixture-of-experts language models.
A suite of models trained using DPO and PPO across a wide variety (up to 14) of preference datasets. See https://arxiv.org/abs/2406.09279 for more!
-
allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm
Text Generation β’ Updated β’ 324 β’ 6 -
allenai/tulu-2.5-preference-data
Viewer β’ Updated β’ 2.12M β’ 985 β’ 19 -
allenai/tulu-2.5-prompts
Viewer β’ Updated β’ 189k β’ 85 β’ 4 -
allenai/tulu-v2.5-ppo-13b-uf-mean
Text Generation β’ 13B β’ Updated β’ 9 β’
Dataset and baseline models for Paloma, a benchmark of language model fit to 546 textual domains
-
AI2 WildBench Leaderboard (V2)
π¦232Display LLM performance leaderboards with customizable views
-
allenai/WildBench
Viewer β’ Updated β’ 2.3k β’ 2.71k β’ 39 -
allenai/WildBench-V2-Model-Outputs
Viewer β’ Updated β’ 62.5k β’ 1.35k β’ 2 -
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Paper β’ 2406.04770 β’ Published β’ 28
Safety data, moderation tools and safe LLMs.
These model's tokenizer did not use HF's fast tokenizer, resulting in variations in how pre-tokenization was applied. Resolved in latest versions.
-
allenai/OLMo-2-1124-13B-Instruct-preview
Text Generation β’ 14B β’ Updated β’ 10 β’ 58 -
allenai/OLMo-2-1124-7B-Instruct-preview
Text Generation β’ 7B β’ Updated β’ 15 β’ 47 -
allenai/OLMo-2-1124-7B-SFT-Preview
Text Generation β’ Updated β’ 16 β’ 3 -
allenai/OLMo-2-1124-7B-DPO-Preview
Text Generation β’ Updated β’ 11 β’ 2
Collection of the embodied reasoning datasets for MolmoAct2
Collection of robotics datasets for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM-Dataset
Viewer β’ Updated β’ 76M β’ 7.69k β’ 1 -
allenai/MolmoAct2-SO100_101-Dataset
Viewer β’ Updated β’ 8.42k β’ 880 β’ 6 -
allenai/MolmoAct2-DROID-Dataset
Viewer β’ Updated β’ 17.8M β’ 5.74k β’ 4
Collection of the base models for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2
Robotics β’ 5B β’ Updated β’ 3.82k β’ 18 -
allenai/MolmoAct2-Think
Robotics β’ 5B β’ Updated β’ 249 β’ 3 -
allenai/MolmoAct2-Pretrain
Robotics β’ 5B β’ Updated β’ 923 β’ 5
Collection of models from the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension".
This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D
This is the collection of MolmoWeb artifacts, including model checkpoints and data.
MolmoPoint models
Training and assets data for MolmoBot release
Ai2 Open Coding Agents - Django, Sphinx, Sympy Data
The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets...
-
allenai/Olmo-3.1-32B-Think
Text Generation β’ 32B β’ Updated β’ 2.94k β’ 103 -
allenai/Olmo-3.1-32B-Instruct-SFT
32B β’ Updated β’ 1.41k β’ 8 -
allenai/Olmo-3.1-32B-Instruct-DPO
Text Generation β’ 32B β’ Updated β’ 614 β’ 6 -
allenai/Olmo-3.1-32B-Instruct
Text Generation β’ 32B β’ Updated β’ 12.9k β’ 78
Artifacts for the Bolmo release: https://allenai.org/papers/bolmo.
Artifacts for the Olmo 3 release.
-
allenai/Olmo-3-1125-32B
Text Generation β’ 32B β’ Updated β’ 11.2k β’ 120 -
allenai/Olmo-3-32B-Think
Text Generation β’ 1.05M β’ Updated β’ 2.87k β’ 171 -
allenai/Olmo-3-1025-7B
Text Generation β’ 7B β’ Updated β’ 117k β’ 70 -
allenai/Olmo-3-7B-Think
Text Generation β’ 528k β’ Updated β’ 70.9k β’ 97
All artifacts related to Olmo 3 pre-training
OlmoEarth pre-trained and fine-tuned foundation models for remote sensing
All datasets for the MolmoAct (Multimodal Open Language Model for Action) release.
Datasets, spaces, and models for Reward Bench 2 benchmark and paper!
-
allenai/reward-bench-2
Viewer β’ Updated β’ 1.87k β’ 3.17k β’ 35 -
Reward Bench Leaderboard
π432Explore and compare model scores on RewardBench benchmarks
-
allenai/reward-bench-2-results
Preview β’ Updated β’ 212 β’ 3 -
allenai/Llama-3.1-70B-Instruct-RM-RB2
Text Classification β’ Updated β’ 24 β’ 1
olmOCR is a document recognition pipeline for efficiently converting documents into plain text.
olmocr.allenai.org
Improved OLMoE for iOS app. Read more: https://allenai.org/blog/olmoe-app
All models released with Tulu 3 -- state of the art open post-training recipes.
-
allenai/Llama-3.1-Tulu-3.1-8B
Text Generation β’ 8B β’ Updated β’ 945 β’ β’ 39 -
allenai/Llama-3.1-Tulu-3-8B
Text Generation β’ 8B β’ Updated β’ 2.04k β’ β’ 179 -
allenai/Llama-3.1-Tulu-3-70B
Text Generation β’ 71B β’ Updated β’ 336 β’ β’ 61 -
allenai/Llama-3.1-Tulu-3-405B
Text Generation β’ Updated β’ 620 β’ 112
Artifacts for open multimodal language models.
-
allenai/Molmo-72B-0924
Image-Text-to-Text β’ 73B β’ Updated β’ 2.05k β’ 300 -
allenai/Molmo-7B-D-0924
Image-Text-to-Text β’ 8B β’ Updated β’ 29.2k β’ 567 -
allenai/Molmo-7B-O-0924
Image-Text-to-Text β’ 8B β’ Updated β’ 958 β’ 164 -
allenai/MolmoE-1B-0924
Image-Text-to-Text β’ Updated β’ 1.2k β’ 158
Artifacts for the first set of OLMo models.
-
allenai/OLMo-1B-0724-hf
Text Generation β’ 1B β’ Updated β’ 2.23k β’ 24 -
allenai/OLMo-7B-0724-hf
Text Generation β’ 7B β’ Updated β’ 3.28k β’ 17 -
allenai/OLMo-7B-0724-SFT-hf
Text Generation β’ 7B β’ Updated β’ 83 β’ 4 -
allenai/OLMo-7B-0724-Instruct-hf
Text Generation β’ 7B β’ Updated β’ 267 β’ 7
Datasets, spaces, and models for the reward model benchmark!
The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2"
Data and models to enhance instruction-following for scientific literature understanding.
ZebraLogic Bench: Testing the Limits of LLMs in Logical Reasoning
-
Zebra Logic Bench
π¦94Display model leaderboard and explore sample puzzles
-
allenai/ZebraLogicBench
Viewer β’ Updated β’ 4.26k β’ 737 β’ 25 -
allenai/ZebraLogicBench-private
Viewer β’ Updated β’ 4.26k β’ 775 β’ 13 -
Faith and Fate: Limits of Transformers on Compositionality
Paper β’ 2305.18654 β’ Published β’ 9
Collection of the evaluation rollouts for MolmoAct2 conducted by Cortex AI
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/eval_molmoact_candy_sorting_in-distribution
Viewer β’ Updated β’ 59.6k β’ 457 -
allenai/eval_molmoact_cup_stacking_in-distribution
Viewer β’ Updated β’ 32k β’ 428 -
allenai/eval_molmoact_cup_storing_in-distribution
Viewer β’ Updated β’ 45.4k β’ 465
Collection of the embodied reasoning datasets for MolmoAct2
Collection of the MolmoAct2-BimanualYAM Dataset
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM-Dataset
Viewer β’ Updated β’ 76M β’ 7.69k β’ 1 -
Lerobot Visualizer V3
π’4Visualize LeRobot datasets with interactive charts
-
Dataset Stats
π1Fetch and view stats for MolmoAct2 datasets
Collection of robotics datasets for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM-Dataset
Viewer β’ Updated β’ 76M β’ 7.69k β’ 1 -
allenai/MolmoAct2-SO100_101-Dataset
Viewer β’ Updated β’ 8.42k β’ 880 β’ 6 -
allenai/MolmoAct2-DROID-Dataset
Viewer β’ Updated β’ 17.8M β’ 5.74k β’ 4
Collection of the fine-tuned models for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2-BimanualYAM
Robotics β’ 5B β’ Updated β’ 2.26k β’ 4 -
allenai/MolmoAct2-SO100_101
Robotics β’ 5B β’ Updated β’ 1.39k β’ 14 -
allenai/MolmoAct2-DROID
Robotics β’ 5B β’ Updated β’ 753 β’ 4
Collection of the base models for MolmoAct2
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper β’ 2605.02881 β’ Published β’ 348 -
allenai/MolmoAct2
Robotics β’ 5B β’ Updated β’ 3.82k β’ 18 -
allenai/MolmoAct2-Think
Robotics β’ 5B β’ Updated β’ 249 β’ 3 -
allenai/MolmoAct2-Pretrain
Robotics β’ 5B β’ Updated β’ 923 β’ 5
Collection of models from the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension".
Artifacts for Branch-Adapt-Route
This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D
This is the collection of all datasets in MolmoWebMix.
This is the collection of MolmoWeb artifacts, including model checkpoints and data.
Data used in the MolmoPoint models
MolmoPoint models
Models collection for MolmoBot release
Training and assets data for MolmoBot release
-
allenai/Dolci-Think-SFT-Olmo-Hybrid
Viewer β’ Updated β’ 2.93M β’ 975 β’ 13 -
allenai/Dolci-Think-SFT-Olmo-Hybrid-Tool-Use-SA
Viewer β’ Updated β’ 1.6k β’ 92 β’ 13 -
allenai/Olmo-Hybrid-Think-SFT-7B
Text Generation β’ 7B β’ Updated β’ 2.07k β’ 19 -
allenai/Olmo-Hybrid-Instruct-SFT-7B
Text Generation β’ 7B β’ Updated β’ 2.8k β’ 17
Ai2 Open Coding Agents - Django, Sphinx, Sympy Data
The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets...
-
allenai/Olmo-3.1-32B-Think
Text Generation β’ 32B β’ Updated β’ 2.94k β’ 103 -
allenai/Olmo-3.1-32B-Instruct-SFT
32B β’ Updated β’ 1.41k β’ 8 -
allenai/Olmo-3.1-32B-Instruct-DPO
Text Generation β’ 32B β’ Updated β’ 614 β’ 6 -
allenai/Olmo-3.1-32B-Instruct
Text Generation β’ 32B β’ Updated β’ 12.9k β’ 78
Artifacts for the Molmo2 release
-
allenai/Molmo2-4B
Image-Text-to-Text β’ 5B β’ Updated β’ 18.8k β’ 51 -
allenai/Molmo2-8B
Image-Text-to-Text β’ 9B β’ Updated β’ 514k β’ 187 -
allenai/Molmo2-O-7B
Image-Text-to-Text β’ 8B β’ Updated β’ 103k β’ 26 -
allenai/Molmo2-VideoPoint-4B
Video-Text-to-Text β’ 5B β’ Updated β’ 347 β’ 21
Artifacts for the Bolmo release: https://allenai.org/papers/bolmo.
Artifacts for the Molmo2 data release
Artifacts for the Olmo 3 release.
-
allenai/Olmo-3-1125-32B
Text Generation β’ 32B β’ Updated β’ 11.2k β’ 120 -
allenai/Olmo-3-32B-Think
Text Generation β’ 1.05M β’ Updated β’ 2.87k β’ 171 -
allenai/Olmo-3-1025-7B
Text Generation β’ 7B β’ Updated β’ 117k β’ 70 -
allenai/Olmo-3-7B-Think
Text Generation β’ 528k β’ Updated β’ 70.9k β’ 97
Smart Any-Horizon Agent for Long Video Reasoning
-
allenai/SAGE-MM-Qwen3-VL-8B-SFT_RL
Video-Text-to-Text β’ 9B β’ Updated β’ 348 β’ 5 -
allenai/SAGE-MM-Molmo2-8B-SFT_RL
Video-Text-to-Text β’ 9B β’ Updated β’ 7 β’ 5 -
allenai/SAGE-MM-Qwen3-VL-4B-SFT_RL
Video-Text-to-Text β’ 5B β’ Updated β’ 33 β’ 6 -
allenai/SAGE-MM-Qwen2.5-VL-7B-SFT_RL
Video-Text-to-Text β’ 8B β’ Updated β’ 35 β’ 2
All artifacts related to Olmo 3 pre-training
All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them.
-
allenai/Olmo-3-7B-Think-SFT
Text Generation β’ 7B β’ Updated β’ 6k β’ 10 -
allenai/Dolci-Think-SFT-7B
Viewer β’ Updated β’ 2.27M β’ 2.27k β’ 15 -
allenai/Olmo-3-7B-Think-DPO
Text Generation β’ 528k β’ Updated β’ 5.94k β’ 7 -
allenai/Dolci-Think-DPO-7B
Viewer β’ Updated β’ 150k β’ 230 β’ 11
OlmoEarth pre-trained and fine-tuned foundation models for remote sensing
All models for the MolmoAct (Multimodal Open Language Model for Action) release.
-
MolmoAct: Action Reasoning Models that can Reason in Space
Paper β’ 2508.07917 β’ Published β’ 45 -
allenai/MolmoAct-7B-D-0812
Robotics β’ 8B β’ Updated β’ 292 β’ 53 -
allenai/MolmoAct-7B-O-0812
Robotics β’ 8B β’ Updated β’ 26 β’ 5 -
allenai/MolmoAct-7B-D-Pretrain-0812
Robotics β’ 8B β’ Updated β’ 47 β’ 8
All datasets for the MolmoAct (Multimodal Open Language Model for Action) release.
Datasets for IFBench benchmark and paper!
Datasets, spaces, and models for Reward Bench 2 benchmark and paper!
-
allenai/reward-bench-2
Viewer β’ Updated β’ 1.87k β’ 3.17k β’ 35 -
Reward Bench Leaderboard
π432Explore and compare model scores on RewardBench benchmarks
-
allenai/reward-bench-2-results
Preview β’ Updated β’ 212 β’ 3 -
allenai/Llama-3.1-70B-Instruct-RM-RB2
Text Classification β’ Updated β’ 24 β’ 1
Artifacts for the OLMo 2 release.
-
allenai/OLMo-2-0425-1B-Instruct
Text Generation β’ 1B β’ Updated β’ 28.8k β’ 57 -
allenai/OLMo-2-0425-1B-Instruct-GGUF
1B β’ Updated β’ 208 β’ 14 -
allenai/OLMo-2-0425-1B
Text Generation β’ 1B β’ Updated β’ 235k β’ 79 -
allenai/OLMo-2-0325-32B-Instruct
Text Generation β’ 32B β’ Updated β’ 5.43k β’ 148
olmOCR is a document recognition pipeline for efficiently converting documents into plain text.
olmocr.allenai.org
A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale.
Improved OLMoE for iOS app. Read more: https://allenai.org/blog/olmoe-app
A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog
All models released with Tulu 3 -- state of the art open post-training recipes.
-
allenai/Llama-3.1-Tulu-3.1-8B
Text Generation β’ 8B β’ Updated β’ 945 β’ β’ 39 -
allenai/Llama-3.1-Tulu-3-8B
Text Generation β’ 8B β’ Updated β’ 2.04k β’ β’ 179 -
allenai/Llama-3.1-Tulu-3-70B
Text Generation β’ 71B β’ Updated β’ 336 β’ β’ 61 -
allenai/Llama-3.1-Tulu-3-405B
Text Generation β’ Updated β’ 620 β’ 112
All datasets released with Tulu 3 -- state of the art open post-training recipes.
-
allenai/tulu-3-sft-mixture
Viewer β’ Updated β’ 939k β’ 20.7k β’ 251 -
allenai/llama-3.1-tulu-3-8b-preference-mixture
Viewer β’ Updated β’ 273k β’ 5.28k β’ 26 -
allenai/llama-3.1-tulu-3-70b-preference-mixture
Viewer β’ Updated β’ 337k β’ 100 β’ 19 -
allenai/llama-3.1-tulu-3-405b-preference-mixture
Viewer β’ Updated β’ 361k β’ 88 β’ 6
Artifacts for open multimodal language models.
-
allenai/Molmo-72B-0924
Image-Text-to-Text β’ 73B β’ Updated β’ 2.05k β’ 300 -
allenai/Molmo-7B-D-0924
Image-Text-to-Text β’ 8B β’ Updated β’ 29.2k β’ 567 -
allenai/Molmo-7B-O-0924
Image-Text-to-Text β’ 8B β’ Updated β’ 958 β’ 164 -
allenai/MolmoE-1B-0924
Image-Text-to-Text β’ Updated β’ 1.2k β’ 158
Artifacts for open mixture-of-experts language models.
Artifacts for the first set of OLMo models.
-
allenai/OLMo-1B-0724-hf
Text Generation β’ 1B β’ Updated β’ 2.23k β’ 24 -
allenai/OLMo-7B-0724-hf
Text Generation β’ 7B β’ Updated β’ 3.28k β’ 17 -
allenai/OLMo-7B-0724-SFT-hf
Text Generation β’ 7B β’ Updated β’ 83 β’ 4 -
allenai/OLMo-7B-0724-Instruct-hf
Text Generation β’ 7B β’ Updated β’ 267 β’ 7
A suite of models trained using DPO and PPO across a wide variety (up to 14) of preference datasets. See https://arxiv.org/abs/2406.09279 for more!
-
allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm
Text Generation β’ Updated β’ 324 β’ 6 -
allenai/tulu-2.5-preference-data
Viewer β’ Updated β’ 2.12M β’ 985 β’ 19 -
allenai/tulu-2.5-prompts
Viewer β’ Updated β’ 189k β’ 85 β’ 4 -
allenai/tulu-v2.5-ppo-13b-uf-mean
Text Generation β’ 13B β’ Updated β’ 9 β’
Datasets, spaces, and models for the reward model benchmark!
Dataset and baseline models for Paloma, a benchmark of language model fit to 546 textual domains
The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2"
-
AI2 WildBench Leaderboard (V2)
π¦232Display LLM performance leaderboards with customizable views
-
allenai/WildBench
Viewer β’ Updated β’ 2.3k β’ 2.71k β’ 39 -
allenai/WildBench-V2-Model-Outputs
Viewer β’ Updated β’ 62.5k β’ 1.35k β’ 2 -
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Paper β’ 2406.04770 β’ Published β’ 28
Data and models to enhance instruction-following for scientific literature understanding.
Safety data, moderation tools and safe LLMs.
ZebraLogic Bench: Testing the Limits of LLMs in Logical Reasoning
-
Zebra Logic Bench
π¦94Display model leaderboard and explore sample puzzles
-
allenai/ZebraLogicBench
Viewer β’ Updated β’ 4.26k β’ 737 β’ 25 -
allenai/ZebraLogicBench-private
Viewer β’ Updated β’ 4.26k β’ 775 β’ 13 -
Faith and Fate: Limits of Transformers on Compositionality
Paper β’ 2305.18654 β’ Published β’ 9
These model's tokenizer did not use HF's fast tokenizer, resulting in variations in how pre-tokenization was applied. Resolved in latest versions.
-
allenai/OLMo-2-1124-13B-Instruct-preview
Text Generation β’ 14B β’ Updated β’ 10 β’ 58 -
allenai/OLMo-2-1124-7B-Instruct-preview
Text Generation β’ 7B β’ Updated β’ 15 β’ 47 -
allenai/OLMo-2-1124-7B-SFT-Preview
Text Generation β’ Updated β’ 16 β’ 3 -
allenai/OLMo-2-1124-7B-DPO-Preview
Text Generation β’ Updated β’ 11 β’ 2