Instructions to use danilopeixoto/pandora-7b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use danilopeixoto/pandora-7b-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="danilopeixoto/pandora-7b-chat")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("danilopeixoto/pandora-7b-chat")
model = AutoModelForCausalLM.from_pretrained("danilopeixoto/pandora-7b-chat")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use danilopeixoto/pandora-7b-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "danilopeixoto/pandora-7b-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "danilopeixoto/pandora-7b-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/danilopeixoto/pandora-7b-chat

SGLang

How to use danilopeixoto/pandora-7b-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "danilopeixoto/pandora-7b-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "danilopeixoto/pandora-7b-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "danilopeixoto/pandora-7b-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "danilopeixoto/pandora-7b-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use danilopeixoto/pandora-7b-chat with Docker Model Runner:
```
docker model run hf.co/danilopeixoto/pandora-7b-chat
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Pandora 7B Chat

Pandora 7B Chat is a Large Language Model (LLM) designed for chat applications.

Pandora is fine-tuned with publicly available datasets, including a tool-calling dataset for agent-based tasks and a Reinforcement Learning from Human Feedback (RLHF) dataset with Direct Preference Optimization (DPO) training for preference alignment.

The fine-tuning process incorporates Low-Rank Adaptation (LoRA) with the MLX framework, optimized for Apple Silicon.

The model is based on the google/gemma-7b model.

Datasets

Datasets used for fine-tuning stages:

Evaluation

Evaluation on MT-Bench multi-turn benchmark:

Usage

Install package dependencies:

pip install mlx-lm

Generate response:

from mlx_lm import load, generate


model, tokenizer = load('danilopeixoto/pandora-7b-chat')

prompt = '''<|start|>system
You are Pandora, a helpful AI assistant.
<|end|>
<|start|>user
Hello!
<|end|>
<|start|>'''

response = generate(model, tokenizer, prompt)
print(response)

The model supports the following prompt templates:

Question-answering with system messages

<|start|>system
{system_message}
<|end|>
<|start|>user
{user_message}
<|end|>
<|start|>assistant
{assistant_message}
<|end|>

Tool calling

<|start|>system
{system_message}
<|end|>
<|start|>system:tools
{system_tools_message}
<|end|>
<|start|>user
{user_message}
<|end|>
<|start|>assistant:tool_calls
{assistant_tool_calls_message}
<|end|>
<|start|>tool
{tool_message}
<|end|>
<|start|>assistant
{assistant_message}
<|end|>

Note The variables system_tools_message, assistant_tool_calls_message, and tool_message must contain valid YAML.

An example of a tool-calling prompt:

prompt = '''<|start|>system
You are Pandora, a helpful AI assistant.
<|end|>
<|start|>system:tools
- description: Get the current weather based on a given location.
  name: get_current_weather
  parameters:
    type: object
    properties:
      location:
        type: string
        description: The location name.
    required:
    - location
<|end|>
<|start|>user
What is the weather in Sydney, Australia?
<|end|>
<|start|>assistant:tool_calls
- name: get_current_weather
  arguments:
    location: Sydney, Australia
<|end|>
<|start|>tool
name: get_current_weather
content: 72°F
<|end|>
<|start|>'''