---
license: mit
datasets:
- UW/olmo-mix-1124-subset-p99
---

We developed this SuperBPE tokenizer for model developers who wish to experiment quickly with an off-the-shelf tokenizer in their pretraining pipeline! This is an English SuperBPE tokenizer with a vocab size of 128K, trained on a subset of the Olmo2 pretraining data.

You can experiment with this tokenizer on our [tokenizer playground](https://superbpe.github.io/) by entering a custom HF repository ID.