Instructions to use Threatthriver/gemma-7b-lora-instruction-tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Threatthriver/gemma-7b-lora-instruction-tuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Threatthriver/gemma-7b-lora-instruction-tuned")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Threatthriver/gemma-7b-lora-instruction-tuned", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Threatthriver/gemma-7b-lora-instruction-tuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Threatthriver/gemma-7b-lora-instruction-tuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Threatthriver/gemma-7b-lora-instruction-tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Threatthriver/gemma-7b-lora-instruction-tuned
- SGLang
How to use Threatthriver/gemma-7b-lora-instruction-tuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Threatthriver/gemma-7b-lora-instruction-tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Threatthriver/gemma-7b-lora-instruction-tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Threatthriver/gemma-7b-lora-instruction-tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Threatthriver/gemma-7b-lora-instruction-tuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Threatthriver/gemma-7b-lora-instruction-tuned with Docker Model Runner:
docker model run hf.co/Threatthriver/gemma-7b-lora-instruction-tuned
threatthriver/Gemma-7B-LoRA-Fine-Tuned
Description
This repository contains LoRA (Low-Rank Adaptation) adapter weights for fine-tuning a Gemma 7B model on a custom dataset.
Important: This is NOT a full model release. It only includes the LoRA adapter weights and a config.json to guide loading the model. You will need to write custom code to load the base Gemma model and apply the adapters.
Model Fine-tuning Details
- Base Model: google/gemma2_9b_en
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 8
- Dataset:
- Training Framework: KerasNLP
How to Use
This release is not directly compatible with the transformers library's standard loading methods. You will need to:
Load the Base Gemma Model: Use KerasNLP to load the
google/gemma2_9b_enbase model. Make sure you have the KerasNLP library installed and properly configured.Enable LoRA: Utilize KerasNLP’s LoRA functionality to enable adapters on the appropriate layers of the Gemma model. Refer to the KerasNLP LoRA documentation for implementation details.
Load Adapter Weights: Load the
adapter_model.binand other relevant files from this repository. Theconfig.jsonfile provides essential configurations for applying the LoRA adapter weights.Integration: Integrate this custom loading process into your Hugging Face Transformers-based code. Ensure you handle the merging of adapter weights with the base model appropriately.
Example Code Structure (Conceptual):
import keras_nlp
from transformers import GemmaTokenizerFast # Or the appropriate tokenizer from KerasNLP
# Load the base Gemma model using KerasNLP
base_model = keras_nlp.models.Gemma.from_pretrained('google/gemma2_9b_en')
# Enable LoRA adapters on target layers
# Assuming you have a function to enable LoRA, e.g., enable_lora(model, rank)
enable_lora(base_model, rank=8)
# Load adapter weights from this repository
# Assuming you have a function to load the weights, e.g., load_lora_weights(model, weights_path)
adapter_weights_path = 'path_to_your_adapter_weights/adapter_model.bin'
load_lora_weights(base_model, adapter_weights_path)
# Initialize tokenizer
tokenizer = GemmaTokenizerFast.from_pretrained('google/gemma2_9b_en')
# Use the tokenizer and model for generation or other tasks
inputs = tokenizer("Your input text", return_tensors="pt")
outputs = base_model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Requirements
- KerasNLP: Install using
pip install keras-nlp - Transformers: Install using
pip install transformers - Other Dependencies: Ensure all dependencies required for KerasNLP and Hugging Face Transformers are installed.
Notes
- Ensure you have the correct versions of KerasNLP and Transformers compatible with each other.
- Custom code for loading and applying LoRA adapters may require adjustments based on your specific use case and the versions of libraries used.
License
This project is licensed under the MIT License.