RLPR
Collection
Extrapolating RLVR to General Domains without Verifiers • 6 items • Updated • 6
How to use openbmb/RLPR-Llama3.1-8B-Inst with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="openbmb/RLPR-Llama3.1-8B-Inst")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("openbmb/RLPR-Llama3.1-8B-Inst")
model = AutoModelForCausalLM.from_pretrained("openbmb/RLPR-Llama3.1-8B-Inst")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use openbmb/RLPR-Llama3.1-8B-Inst with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/RLPR-Llama3.1-8B-Inst"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Llama3.1-8B-Inst",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/openbmb/RLPR-Llama3.1-8B-Inst
How to use openbmb/RLPR-Llama3.1-8B-Inst with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Llama3.1-8B-Inst" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Llama3.1-8B-Inst",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "openbmb/RLPR-Llama3.1-8B-Inst" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "openbmb/RLPR-Llama3.1-8B-Inst",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use openbmb/RLPR-Llama3.1-8B-Inst with Docker Model Runner:
docker model run hf.co/openbmb/RLPR-Llama3.1-8B-Inst
RLPR-Llama3.1-8B-Inst is trained from Llama3.1-8B-Inst with the RLPR framework, which eliminates reliance on external verifiers and is simple and generalizable for more domains.
# pip install accelerate
import transformers
import torch
model_id = "openbmb/RLPR-Llama3.1-8B-Inst"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
If you find our model/code/paper helpful, please consider citing our papers 📝:
@misc{yu2025rlprextrapolatingrlvrgeneral,
title={RLPR: Extrapolating RLVR to General Domains without Verifiers},
author={Tianyu Yu and Bo Ji and Shouli Wang and Shu Yao and Zefan Wang and Ganqu Cui and Lifan Yuan and Ning Ding and Yuan Yao and Zhiyuan Liu and Maosong Sun and Tat-Seng Chua},
year={2025},
eprint={2506.18254},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.18254},
}