Instructions to use Hastagaras/Sunmoy-9B-G2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Hastagaras/Sunmoy-9B-G2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Hastagaras/Sunmoy-9B-G2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Hastagaras/Sunmoy-9B-G2") model = AutoModelForCausalLM.from_pretrained("Hastagaras/Sunmoy-9B-G2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Hastagaras/Sunmoy-9B-G2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Hastagaras/Sunmoy-9B-G2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Hastagaras/Sunmoy-9B-G2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Hastagaras/Sunmoy-9B-G2
- SGLang
How to use Hastagaras/Sunmoy-9B-G2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Hastagaras/Sunmoy-9B-G2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Hastagaras/Sunmoy-9B-G2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Hastagaras/Sunmoy-9B-G2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Hastagaras/Sunmoy-9B-G2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Hastagaras/Sunmoy-9B-G2 with Docker Model Runner:
docker model run hf.co/Hastagaras/Sunmoy-9B-G2
10b size
I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.
I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.
Yes, I've realized that the old version of transformers and mergekit would add lm_head weights to models even if they originally didn't have an lm_head (like Gemma 2 and Llama 3.2 3B/1B). I actually already have a non-bloated version of the model - I had kept it private before. It's the same model, but I removed the lm_head after loading it and then reuploaded/pushed it to the Hub: https://huggingface.co/Hastagaras/sunmoy-no-lmhead-test
Would love to test out a gguf q4_0 model