Instructions to use Hastagaras/Sunmoy-9B-G2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Hastagaras/Sunmoy-9B-G2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Hastagaras/Sunmoy-9B-G2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Hastagaras/Sunmoy-9B-G2")
model = AutoModelForCausalLM.from_pretrained("Hastagaras/Sunmoy-9B-G2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Hastagaras/Sunmoy-9B-G2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Hastagaras/Sunmoy-9B-G2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hastagaras/Sunmoy-9B-G2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Hastagaras/Sunmoy-9B-G2

SGLang

How to use Hastagaras/Sunmoy-9B-G2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Hastagaras/Sunmoy-9B-G2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hastagaras/Sunmoy-9B-G2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Hastagaras/Sunmoy-9B-G2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Hastagaras/Sunmoy-9B-G2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Hastagaras/Sunmoy-9B-G2 with Docker Model Runner:
```
docker model run hf.co/Hastagaras/Sunmoy-9B-G2
```

10b size

by WesPro - opened Jan 14, 2025

Discussion

WesPro

Jan 14, 2025

I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.

Hastagaras

Owner Jan 15, 2025

I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.

Yes, I've realized that the old version of transformers and mergekit would add lm_head weights to models even if they originally didn't have an lm_head (like Gemma 2 and Llama 3.2 3B/1B). I actually already have a non-bloated version of the model - I had kept it private before. It's the same model, but I removed the lm_head after loading it and then reuploaded/pushed it to the Hub: https://huggingface.co/Hastagaras/sunmoy-no-lmhead-test

ONTHEREDTEAM

Jan 31, 2025

Would love to test out a gguf q4_0 model

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment