haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model

Model Overview

v2mini-eval1 model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.

Detail Value
Model Architecture LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count ~318 Million (318M)
Training Type Trained from Scratch (Random Initialization)
Tokenizer Custom BPE, Vocab Size 32,000
Sequence Length 1024 tokens
Attention Type Grouped Query Attention (GQA)

Configuration Details

This model is a custom size and configuration based on Llama:

Parameter Value
Number of Layers 20
Hidden Size (d) 1024
Intermediate Size ($\text{d}_{\text{ff}}$) 2752
Attention Heads 16 (Query) / 8 (Key/Value)
Activation Function SiLU (silu)
Normalization RMS Norm (rms_norm_eps: 1e-05)
Position Embeddings Rotary Positional Embeddings (RoPE)

Model Issues

This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this:

  • default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"

  • fixed: "Does that work more of his excellent stirring, in his plays"

This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Downloads last month
175
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 4 Ask for provider support

Dataset used to train haykgrigorian/v2mini-eval1