Instructions to use distilbert/distilgpt2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use distilbert/distilgpt2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="distilbert/distilgpt2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2") model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use distilbert/distilgpt2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "distilbert/distilgpt2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distilbert/distilgpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/distilbert/distilgpt2
- SGLang
How to use distilbert/distilgpt2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "distilbert/distilgpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distilbert/distilgpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "distilbert/distilgpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "distilbert/distilgpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use distilbert/distilgpt2 with Docker Model Runner:
docker model run hf.co/distilbert/distilgpt2
How did you make the tflite model?
I tried to recreate your tflite model using the following code I took from an android project
import tensorflow as tf
from transformers import TFGPT2LMHeadModel
model = TFGPT2LMHeadModel.from_pretrained('distilgpt2')
input_spec = tf.TensorSpec([1, 64], tf.int32)
model._set_inputs(input_spec, training=False)
print(model.inputs)
print(model.outputs)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# For FP16 quantization:
#converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()
open("distilgpt2-64.tflite", "wb").write(tflite_model)
and it did not run correctly (it produced a tflite file that didn't work with https://github.com/huggingface/tflite-android-transformers), whereas your model works with the android app great.
It was slightly smaller 324626324 bytes and had a different hash ef5bf0a1dbbf640a1b1b3f03e1c7d43cd99e19f1f2dd2568cda91511f72da38d distilgpt2-64.tflite
I want to convert your model and make it smaller and try it as the 16 bit floating point or 8bit versions, as that tflite android project does with the real gpt2, but I don't know what you did to create your tflite model. Is there a repo somewhere? Or could you upload the script that did it? Thanks!
Hi @GayCodeGal , sorry for the late reply!
Your code seems indeed to be the same as https://github.com/huggingface/tflite-android-transformers/blob/master/models_generation/gpt2.py, which is the script that was used to generate the TFLite version for both gpt2 and distilgpt2. My guess here is that it has to do with the versions of TFLite you're using - if you don't have any error during the generation, the generated model is probably correct, but it's possible that it's incompatible with the android app because of a different TFLite version used in the app (it seems to be TFLite 2.0, see https://github.com/huggingface/tflite-android-transformers/blob/master/gpt2/build.gradle#L56).