Ornstein-31B-it

Ornstein-31B-it — GGUF

GGUF quantizations of DJLougen/Ornstein-31B-it, a Gemma 4 31B vision-language model fine-tuned with Unsloth.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi


Available Quantizations

File Quant Size Description
Ornstein-31B-it-Q2_K.gguf Q2_K 11.1 GB Smallest, lowest quality
Ornstein-31B-it-Q3_K_M.gguf Q3_K_M 14.2 GB Low quality
Ornstein-31B-it-Q4_K_M.gguf Q4_K_M 17.4 GB Recommended — good balance of quality and size
Ornstein-31B-it-Q5_K_M.gguf Q5_K_M 20.3 GB High quality
Ornstein-31B-it-Q6_K.gguf Q6_K 23.5 GB Very high quality
Ornstein-31B-it-Q8_0.gguf Q8_0 30.4 GB Near-lossless
mmproj-Ornstein-31B-it-F16.gguf F16 1.1 GB Vision encoder (required for image input)

Usage

llama.cpp

# Text-only
llama-cli -m Ornstein-31B-it-Q4_K_M.gguf -p "Hello, tell me about yourself"

# With vision (image input)
llama-gemma3-cli -m Ornstein-31B-it-Q4_K_M.gguf --mmproj mmproj-Ornstein-31B-it-F16.gguf --image photo.jpg -p "Describe this image"

llama-cpp-python

from llama_cpp import Llama

llm = Llama(model_path="Ornstein-31B-it-Q4_K_M.gguf", n_gpu_layers=-1)
output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Hello!"}]
)
print(output["choices"][0]["message"]["content"])

Details

Downloads last month
869
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein-31B-it-GGUF

Quantized
(1)
this model