Gemma-4-E2B-it Agentic AI on AWS (Instruct)

This model is an instruction-tuned (Supervised Fine-Tuning) version of Google's Gemma-4-E2B-it. It has been specialized to act as a conversational assistant, answering complex architectural questions regarding Agentic AI systems, frameworks, and protocols on Amazon Web Services (AWS).

The model was fine-tuned using Unsloth to enhance its ability to reason about AWS architectures and provide actionable, structured guidance based on official AWS prescriptive documentation.

Model Details

  • Base Model: google/gemma-4-e2b-it (via unsloth/gemma-4-E2B-it)
  • Training Type: Supervised Fine-Tuning (Instruction/Chat)
  • Domain focus: AWS Architecture, Agentic AI, Frameworks, and Protocols (MCP, etc.)
  • Language: English
  • Library: Unsloth / Hugging Face Transformers

Dataset

The model was trained on instruction-response pairs sourced from AWS Prescriptive Guidance: Agentic AI frameworks, platforms, protocols, and tools on AWS. It has been taught to answer queries concisely and provide highly technical, context-aware AWS architecture advice based on modern Agentic standards.

Training Configuration

Unlike a base model, this model already understands conversational flow. Fine-tuning was constrained to the attention and MLP layers to smoothly adapt its persona and specific technical knowledge without causing catastrophic forgetting.

  • Method: PEFT / LoRA
  • LoRA Rank (r): 16 (Standard for Instruct tuning)
  • LoRA Alpha: 16 (or 32, scaled for optimal learning)
  • Target Modules: Attention and MLP modules (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • Precision: 4-bit quantization (QLoRA) during training
  • Optimizer: Paged AdamW 8-bit

How to Use

Because this is an Instruct model, you must use the standard Gemma chat template when querying it. You can interact with it exactly like a standard chatbot.

You can load it using Transformers or Unsloth:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Capitaller/gemma_4E2B-it_finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Ensure you use the proper chat format
messages = [
    {"role": "user", "content": "How should I design an Agentic AI architecture on AWS that uses the Model Context Protocol (MCP)?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Prompting Tips

This model is designed to be highly instructional and responsive to direct questions.

  • Ask clear, technical questions: "What are the recommended AWS compute platforms for hosting an MCP server?"
  • Request specific structures: "Write a brief step-by-step guide on securing an AI agent communication channel on AWS."
Downloads last month
64
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support