Instructions to use allenai/OLMo-7B-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/OLMo-7B-hf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="allenai/OLMo-7B-hf")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-hf") model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use allenai/OLMo-7B-hf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "allenai/OLMo-7B-hf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-7B-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/allenai/OLMo-7B-hf
- SGLang
How to use allenai/OLMo-7B-hf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "allenai/OLMo-7B-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-7B-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "allenai/OLMo-7B-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-7B-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use allenai/OLMo-7B-hf with Docker Model Runner:
docker model run hf.co/allenai/OLMo-7B-hf
Intermediate checkpoints for HF model
Thanks to the whole team for the great work on the OLMo models!
On the model card you state:
We are releasing many checkpoints for these models, for every 1000 training steps.
These have not yet been converted into Hugging Face Transformers format, but are available in allenai/OLMo-7B.
Are you still converting the checkpoints to HF format? Would be really helpful for easily comparing different checkpoints with transformers (also for the 1B model).
+1 would like to follow up on this as I would like to use the HF format models!
As of today, we have released almost all the checkpoints of the newer allenai/OLMo-1.7-7B-hf model. The original 1B model will probably be next.
If you have any particular intermediate checkpoints you are interested in using, then one option is to convert these to HF format yourself (it takes maybe 5-10 mins per checkpoint). The instructions are in Checkpoints.md. The idea is to find the official checkpoint you want in https://github.com/allenai/OLMo/blob/main/checkpoints/official and then use convert_olmo_to_hf_new.py to convert it to HF format.