Instructions to use Threatthriver/gemma-7b-lora-instruction-tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Threatthriver/gemma-7b-lora-instruction-tuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Threatthriver/gemma-7b-lora-instruction-tuned")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Threatthriver/gemma-7b-lora-instruction-tuned", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Threatthriver/gemma-7b-lora-instruction-tuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Threatthriver/gemma-7b-lora-instruction-tuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Threatthriver/gemma-7b-lora-instruction-tuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Threatthriver/gemma-7b-lora-instruction-tuned

SGLang

How to use Threatthriver/gemma-7b-lora-instruction-tuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Threatthriver/gemma-7b-lora-instruction-tuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Threatthriver/gemma-7b-lora-instruction-tuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Threatthriver/gemma-7b-lora-instruction-tuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Threatthriver/gemma-7b-lora-instruction-tuned",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Threatthriver/gemma-7b-lora-instruction-tuned with Docker Model Runner:
```
docker model run hf.co/Threatthriver/gemma-7b-lora-instruction-tuned
```

threatthriver/Gemma-7B-LoRA-Fine-Tuned

Description

This repository contains LoRA (Low-Rank Adaptation) adapter weights for fine-tuning a Gemma 7B model on a custom dataset.

Important: This is NOT a full model release. It only includes the LoRA adapter weights and a config.json to guide loading the model. You will need to write custom code to load the base Gemma model and apply the adapters.

Model Fine-tuning Details

Base Model: google/gemma2_9b_en
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 8
Dataset:
Training Framework: KerasNLP

How to Use

This release is not directly compatible with the transformers library's standard loading methods. You will need to:

Load the Base Gemma Model: Use KerasNLP to load the google/gemma2_9b_en base model. Make sure you have the KerasNLP library installed and properly configured.
Enable LoRA: Utilize KerasNLP’s LoRA functionality to enable adapters on the appropriate layers of the Gemma model. Refer to the KerasNLP LoRA documentation for implementation details.
Load Adapter Weights: Load the adapter_model.bin and other relevant files from this repository. The config.json file provides essential configurations for applying the LoRA adapter weights.
Integration: Integrate this custom loading process into your Hugging Face Transformers-based code. Ensure you handle the merging of adapter weights with the base model appropriately.

Example Code Structure (Conceptual):

import keras_nlp
from transformers import GemmaTokenizerFast  # Or the appropriate tokenizer from KerasNLP

# Load the base Gemma model using KerasNLP
base_model = keras_nlp.models.Gemma.from_pretrained('google/gemma2_9b_en')

# Enable LoRA adapters on target layers
# Assuming you have a function to enable LoRA, e.g., enable_lora(model, rank)
enable_lora(base_model, rank=8)

# Load adapter weights from this repository
# Assuming you have a function to load the weights, e.g., load_lora_weights(model, weights_path)
adapter_weights_path = 'path_to_your_adapter_weights/adapter_model.bin'
load_lora_weights(base_model, adapter_weights_path)

# Initialize tokenizer
tokenizer = GemmaTokenizerFast.from_pretrained('google/gemma2_9b_en')

# Use the tokenizer and model for generation or other tasks
inputs = tokenizer("Your input text", return_tensors="pt")
outputs = base_model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Requirements

KerasNLP: Install using pip install keras-nlp
Transformers: Install using pip install transformers
Other Dependencies: Ensure all dependencies required for KerasNLP and Hugging Face Transformers are installed.

Notes

Ensure you have the correct versions of KerasNLP and Transformers compatible with each other.
Custom code for loading and applying LoRA adapters may require adjustments based on your specific use case and the versions of libraries used.

License

This project is licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Threatthriver/gemma-7b-lora-instruction-tuned

Paper for Threatthriver/gemma-7b-lora-instruction-tuned

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 61