Mistral-7B LLM Architecture Expert

A fine-tuned version of Mistral-7B-Instruct-v0.3 trained using QLoRA on a custom dataset focused on LLM architecture concepts and internals.

Topics covered include:

  • Attention mechanisms
  • Transformers
  • Training dynamics
  • Scaling laws
  • KV cache
  • Tokenization
  • Fine-tuning methods
  • LLM evaluation

Training Details

Parameter Value
Base model mistralai/Mistral-7B-Instruct-v0.3
Method QLoRA (NF4 4-bit + LoRA)
Dataset 500 custom instruction examples
Domain LLM Architecture
LoRA Rank 64
Trainable Parameters 2.26%
Optimizer Paged AdamW
Learning Rate Schedule Cosine + 3% warmup
Final Training Loss 1.2629
Training Time ~3.3 minutes

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FazeFlynn/mistral-7b-llm-architecture-expert"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16
)

prompt = "[INST] Explain how KV cache works in transformers [/INST]"

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=200
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
97
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FazeFlynn/mistral-7b-llm-architecture-expert

Adapter
(912)
this model