Instructions to use ghost-x/ghost-7b-alpha-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ghost-x/ghost-7b-alpha-gguf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ghost-x/ghost-7b-alpha-gguf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ghost-x/ghost-7b-alpha-gguf", dtype="auto")

llama-cpp-python

How to use ghost-x/ghost-7b-alpha-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ghost-x/ghost-7b-alpha-gguf",
	filename="ghost-7b-alpha-Q4_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use ghost-x/ghost-7b-alpha-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ghost-x/ghost-7b-alpha-gguf:Q4_K_M

Use Docker

docker model run hf.co/ghost-x/ghost-7b-alpha-gguf:Q4_K_M

LM Studio
Jan

vLLM

How to use ghost-x/ghost-7b-alpha-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ghost-x/ghost-7b-alpha-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ghost-x/ghost-7b-alpha-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ghost-x/ghost-7b-alpha-gguf:Q4_K_M

SGLang

How to use ghost-x/ghost-7b-alpha-gguf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ghost-x/ghost-7b-alpha-gguf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ghost-x/ghost-7b-alpha-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ghost-x/ghost-7b-alpha-gguf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ghost-x/ghost-7b-alpha-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use ghost-x/ghost-7b-alpha-gguf with Ollama:
```
ollama run hf.co/ghost-x/ghost-7b-alpha-gguf:Q4_K_M
```

Unsloth Studio new

How to use ghost-x/ghost-7b-alpha-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ghost-x/ghost-7b-alpha-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ghost-x/ghost-7b-alpha-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ghost-x/ghost-7b-alpha-gguf to start chatting

Docker Model Runner
How to use ghost-x/ghost-7b-alpha-gguf with Docker Model Runner:
```
docker model run hf.co/ghost-x/ghost-7b-alpha-gguf:Q4_K_M
```

Lemonade

How to use ghost-x/ghost-7b-alpha-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ghost-x/ghost-7b-alpha-gguf:Q4_K_M

Run and chat with the model

lemonade run user.ghost-7b-alpha-gguf-Q4_K_M

List all available models

lemonade list

Ghost 7B Alpha

The large generation of language models focuses on optimizing excellent reasoning, multi-task knowledge, and tools support.

Introduction

Ghost 7B Alpha is a large language model fine-tuned from Mistral 7B, with a size of 7 billion parameters. The model was developed with the goal of optimizing reasoning ability, multi-task knowledge and supporting tool usage. The model works well with the main trained and optimized languages being English and Vietnamese.

Overall, the model is suitable when making a pretrained version so you can continue to develop the desired tasks, develop virtual assistants, perform features on tasks such as coding, translation, answering questions, creating documents, etc. It is truly an efficient, fast and extremely cheap open model.

Specifications

Name: Ghost 7B Alpha.
Model size: 7 billion parameters.
Context length: 8K, 8192.
Languages: English and Vietnamese.
Main tasks: reasoning, multi-tasking knowledge and function tools.
License: Ghost 7B LICENSE AGREEMENT.
Based on: Mistral 7B.
Distributions: Standard (BF16), GGUF, AWQ.
Developed by: Ghost X, Hieu Lam.

Distributions

We create many distributions to give you the best access options that best suit your needs. Always make sure you know which version you need and what will help you operate better.

Version	Model card
BF16	🤗 HuggingFace
GGUF	🤗 HuggingFace
AWQ	🤗 HuggingFace

Note

For all official information and updates about the model, see here:

Card model: 🤗 HuggingFace.
Official website: Ghost 7B Alpha.
Demo: Playground with Ghost 7B Alpha.

Downloads last month: 151

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

8-bit

Collection including ghost-x/ghost-7b-alpha-gguf

Ghost 7B Alpha

Collection

The next generation of large language models focuses on optimization for excellent reasoning, multi-task knowledge, and multilingual. • 7 items • Updated Jul 17, 2024

ghost-x
/

ghost-7b-alpha-gguf

Ghost 7B Alpha

Introduction

Specifications

Links

Distributions

Note

Collection including ghost-x/ghost-7b-alpha-gguf

Ghost 7B Alpha