haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model

Model Overview

v2mini-eval1 model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.

Detail	Value
Model Architecture	LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count	~318 Million (318M)
Training Type	Trained from Scratch (Random Initialization)
Tokenizer	Custom BPE, Vocab Size 32,000
Sequence Length	1024 tokens
Attention Type	Grouped Query Attention (GQA)

Configuration Details

This model is a custom size and configuration based on Llama:

Parameter	Value
Number of Layers	20
Hidden Size (d)	1024
Intermediate Size ($\text{d}_{\text{ff}}$)	2752
Attention Heads	16 (Query) / 8 (Key/Value)
Activation Function	SiLU (`silu`)
Normalization	RMS Norm (`rms_norm_eps`: 1e-05)
Position Embeddings	Rotary Positional Embeddings (RoPE)

Model Issues

This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this:

default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"
fixed: "Does that work more of his excellent stirring, in his plays"

This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Test Script Link: test_v2mini_eval1.py on GitHub

Downloads last month: 175

Safetensors

Model size

0.3B params

Tensor type

F32

haykgrigorian
/

v2mini-eval1