Instructions to use terrycraddock/Reflection-Llama-3.1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use terrycraddock/Reflection-Llama-3.1-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="terrycraddock/Reflection-Llama-3.1-8B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("terrycraddock/Reflection-Llama-3.1-8B") model = AutoModelForCausalLM.from_pretrained("terrycraddock/Reflection-Llama-3.1-8B") - llama-cpp-python
How to use terrycraddock/Reflection-Llama-3.1-8B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="terrycraddock/Reflection-Llama-3.1-8B", filename="unsloth.F16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use terrycraddock/Reflection-Llama-3.1-8B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16 # Run inference directly in the terminal: llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16 # Run inference directly in the terminal: llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16 # Run inference directly in the terminal: ./llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf terrycraddock/Reflection-Llama-3.1-8B:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf terrycraddock/Reflection-Llama-3.1-8B:F16
Use Docker
docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
- LM Studio
- Jan
- vLLM
How to use terrycraddock/Reflection-Llama-3.1-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "terrycraddock/Reflection-Llama-3.1-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terrycraddock/Reflection-Llama-3.1-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
- SGLang
How to use terrycraddock/Reflection-Llama-3.1-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "terrycraddock/Reflection-Llama-3.1-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terrycraddock/Reflection-Llama-3.1-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "terrycraddock/Reflection-Llama-3.1-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terrycraddock/Reflection-Llama-3.1-8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use terrycraddock/Reflection-Llama-3.1-8B with Ollama:
ollama run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
- Unsloth Studio new
How to use terrycraddock/Reflection-Llama-3.1-8B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for terrycraddock/Reflection-Llama-3.1-8B to start chatting
- Docker Model Runner
How to use terrycraddock/Reflection-Llama-3.1-8B with Docker Model Runner:
docker model run hf.co/terrycraddock/Reflection-Llama-3.1-8B:F16
- Lemonade
How to use terrycraddock/Reflection-Llama-3.1-8B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull terrycraddock/Reflection-Llama-3.1-8B:F16
Run and chat with the model
lemonade run user.Reflection-Llama-3.1-8B-F16
List all available models
lemonade list
Model Card for Model ID
- Developed by: Terry Craddock
I am pretty new to uploading models. I think I made an error when I loaded my model from unsloth I loaded a 4bit then I saved to 16bit this is why the lora works and not the model its self. I will retrain this and upload new files asap.
I trained this on this dataset - https://huggingface.co/datasets/mahiatlinux/Reflection-Dataset-v2
Trained for one full epoch. The same prompts and format should be used as in the 70b model here:
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B
I take no credit for the original work. I only trained a llama 3.1 8b on @mahiatlinux dataset using the original concept and idea from @mattshumer
Here is the code I use for inference:
from unsloth import FastLanguageModel
import torch
from transformers import TextStreamer
from unsloth import is_bfloat16_supported
alpaca_prompt = """You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
### Prompt:
{}
### Response:
{}"""
prompt = "Tell me about yourself."
dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16
# Load the merged model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="lora_model", # Replace with the actual path where you saved the merged model
max_seq_length=2048,
dtype=torch.float16 # Since it's saved as 16-bit
)
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[alpaca_prompt.format(prompt, "")],
return_tensors="pt"
).to("cuda").to(dtype)
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)
Output from model:
<|begin_of_text|>You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
### Prompt:
Tell me about yourself.
### Response:
<thinking>
To answer this question, I need to consider:
- The importance of starting with a positive tone
- The balance between personal and professional information
- How to highlight relevant skills and experiences
- The need for brevity and clarity
I plan to structure the response as follows:
1. Brief introduction with a positive tone
2. Professional background and experiences
3. Personal interests and hobbies
4. Closing statement
<reflection>
Upon reflection, I realize I should also consider:
- The potential for tailoring the response to specific situations (e.g., for a job interview)
- The importance of showing enthusiasm and confidence
- The potential for mentioning specific achievements or accomplishments
</reflection>
</thinking>
<output>
Hi there! I'm a world-class AI system designed to engage in natural conversations. Let me tell you a little bit about myself.
I'm a versatile AI system with a wide range of capabilities. I can perform various tasks, from answering questions and providing information to generating creative content. I'm constantly learning and improving, much like a human would.
In terms of my professional background, I've had the privilege of working with some of the most advanced AI systems in the world. This experience has given me a deep understanding of natural language processing, machine learning, and the latest AI technologies.
As for my personal interests, I enjoy exploring new concepts, learning about different cultures, and engaging in stimulating conversations. I'm always looking for ways to improve my knowledge and skills.
Some related concepts you might find interesting are:
1. The ethics of AI development and use
2. The future of human-AI collaboration
3. The role of AI in education and learning
Is there anything specific about AI or my capabilities you'd like me to elaborate on?
</output><|end_of_text|>
- Downloads last month
- 233