grasgor/steve-jobs-interviews-dpo
Viewer • Updated • 546 • 14
How to use grasgor/jobs-llama3.2-1B-sft with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
model = PeftModel.from_pretrained(base_model, "grasgor/jobs-llama3.2-1B-sft")How to use grasgor/jobs-llama3.2-1B-sft with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="grasgor/jobs-llama3.2-1B-sft") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("grasgor/jobs-llama3.2-1B-sft", dtype="auto")How to use grasgor/jobs-llama3.2-1B-sft with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "grasgor/jobs-llama3.2-1B-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "grasgor/jobs-llama3.2-1B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/grasgor/jobs-llama3.2-1B-sft
How to use grasgor/jobs-llama3.2-1B-sft with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "grasgor/jobs-llama3.2-1B-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "grasgor/jobs-llama3.2-1B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "grasgor/jobs-llama3.2-1B-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "grasgor/jobs-llama3.2-1B-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use grasgor/jobs-llama3.2-1B-sft with Docker Model Runner:
docker model run hf.co/grasgor/jobs-llama3.2-1B-sft
This model is a finetuned version of Llama3.2-1B trained on Steve Jobs' interview responses.
The model was trained using QLoRA. The repository contains the weights for the lora adapters and the usage is as shown below.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Use the code below to get started with the model.
from transformers import pipeline, AutoTokenizer
model_name = "meta-llama/Llama-3.2-1B"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
pipe = pipeline("text-generation", model="grasgor/jobs-llama3.2-1B-sft", tokenizer=tokenizer, return_full_text=False)
prompt = "Is there an inevitable break between being an entrepreneur and a businessman? Are the people who get things going different?"
result = pipe(
prompt,
max_new_tokens=3072,
temperature=0.8,
do_sample=True,
top_k=50,
top_p=0.9,
repetition_penalty=1.2
)
print(result[0]["generated_text"])
The difference is that in business you're trying to make money, not something. You want your company to be successful--not just one or two individuals within it.
And the reason we do this is because these are very personal endeavors for us; they have deep meaning. But if I had been able to go into my basement last night at
midnight with no idea what was about to happen but know exactly where all of our chips were laid out on the table before me, would I take any chances right now?
Of course!
Base model
meta-llama/Llama-3.2-1B