Instructions to use terrycraddock/Reflection-Llama-3.1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use terrycraddock/Reflection-Llama-3.1-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="terrycraddock/Reflection-Llama-3.1-8B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("terrycraddock/Reflection-Llama-3.1-8B")
model = AutoModelForCausalLM.from_pretrained("terrycraddock/Reflection-Llama-3.1-8B")

llama-cpp-python

How to use terrycraddock/Reflection-Llama-3.1-8B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="terrycraddock/Reflection-Llama-3.1-8B",
	filename="unsloth.F16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use terrycraddock/Reflection-Llama-3.1-8B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16
# Run inference directly in the terminal:
llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16
# Run inference directly in the terminal:
llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16
# Run inference directly in the terminal:
./llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16

Use Docker

docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16

LM Studio
Jan

vLLM

How to use terrycraddock/Reflection-Llama-3.1-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "terrycraddock/Reflection-Llama-3.1-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terrycraddock/Reflection-Llama-3.1-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16

SGLang

How to use terrycraddock/Reflection-Llama-3.1-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "terrycraddock/Reflection-Llama-3.1-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terrycraddock/Reflection-Llama-3.1-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "terrycraddock/Reflection-Llama-3.1-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terrycraddock/Reflection-Llama-3.1-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use terrycraddock/Reflection-Llama-3.1-8B with Ollama:
```
ollama run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
```

Unsloth Studio new

How to use terrycraddock/Reflection-Llama-3.1-8B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting

Docker Model Runner
How to use terrycraddock/Reflection-Llama-3.1-8B with Docker Model Runner:
```
docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
```

Lemonade

How to use terrycraddock/Reflection-Llama-3.1-8B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull terrycraddock/Reflection-Llama-3.1-8B:F16

Run and chat with the model

lemonade run user.Reflection-Llama-3.1-8B-F16

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Model ID

Developed by: Terry Craddock

I am pretty new to uploading models. I think I made an error when I loaded my model from unsloth I loaded a 4bit then I saved to 16bit this is why the lora works and not the model its self. I will retrain this and upload new files asap.

I trained this on this dataset - https://huggingface.co/datasets/mahiatlinux/Reflection-Dataset-v2

Trained for one full epoch. The same prompts and format should be used as in the 70b model here:

https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B

I take no credit for the original work. I only trained a llama 3.1 8b on @mahiatlinux dataset using the original concept and idea from @mattshumer

Here is the code I use for inference:

from unsloth import FastLanguageModel
import torch
from transformers import TextStreamer
from unsloth import is_bfloat16_supported


alpaca_prompt = """You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
### Prompt:
{}

### Response:
{}"""

prompt = "Tell me about yourself."

dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16

# Load the merged model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="lora_model",  # Replace with the actual path where you saved the merged model
    max_seq_length=2048,
    dtype=torch.float16  # Since it's saved as 16-bit
)

FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [alpaca_prompt.format(prompt, "")],
    return_tensors="pt"
).to("cuda").to(dtype)

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)



Output from model:

<|begin_of_text|>You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
### Prompt:
Tell me about yourself.

### Response:
<thinking>
To answer this question, I need to consider:
- The importance of starting with a positive tone
- The balance between personal and professional information
- How to highlight relevant skills and experiences
- The need for brevity and clarity

I plan to structure the response as follows:
1. Brief introduction with a positive tone
2. Professional background and experiences
3. Personal interests and hobbies
4. Closing statement

<reflection>
Upon reflection, I realize I should also consider:
- The potential for tailoring the response to specific situations (e.g., for a job interview)
- The importance of showing enthusiasm and confidence
- The potential for mentioning specific achievements or accomplishments
</reflection>
</thinking>

<output>
Hi there! I'm a world-class AI system designed to engage in natural conversations. Let me tell you a little bit about myself.

I'm a versatile AI system with a wide range of capabilities. I can perform various tasks, from answering questions and providing information to generating creative content. I'm constantly learning and improving, much like a human would.

In terms of my professional background, I've had the privilege of working with some of the most advanced AI systems in the world. This experience has given me a deep understanding of natural language processing, machine learning, and the latest AI technologies.

As for my personal interests, I enjoy exploring new concepts, learning about different cultures, and engaging in stimulating conversations. I'm always looking for ways to improve my knowledge and skills.

Some related concepts you might find interesting are:
1. The ethics of AI development and use
2. The future of human-AI collaboration
3. The role of AI in education and learning

Is there anything specific about AI or my capabilities you'd like me to elaborate on?
</output><|end_of_text|>

Downloads last month: 233

Safetensors

Model size

8B params

Tensor type

F16

Model tree for terrycraddock/Reflection-Llama-3.1-8B

Merges

1 model

Quantizations

6 models

terrycraddock
/

Reflection-Llama-3.1-8B

Model Card for Model ID

Model tree for terrycraddock/Reflection-Llama-3.1-8B

Spaces using terrycraddock/Reflection-Llama-3.1-8B 10