Llama-3.2-OctoThinker-iNano-1B-GGUF

Model Summary

Llama-3.2-OctoThinker-iNano-1B-GGUF is the GGUF quantized release of the main model:

Main model repo:
https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B

This repository packages the model for efficient local inference in GGUF-compatible runtimes such as llama.cpp, LM Studio, and similar local tools.

GGUF to Main Model Link

This GGUF repository corresponds to the main model repo:

gss1147/Llama-3.2-OctoThinker-iNano-1B

If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above.

Available Files

This GGUF repository currently includes:

Q4_K_M — 955 MB
Q5_K_M — 1.09 GB
F16 — 3 GB

Architecture

Architecture: llama
Model size: 1B params

Intended Use

This model is intended for:

local text generation
assistant-style prompting
lightweight reasoning tasks
summarization
simple coding help
offline/local inference workflows

Quantization Notes

Choose the file that best matches your hardware:

Q4_K_M for smaller size and lighter RAM usage
Q5_K_M for a stronger quality-to-size balance
F16 for the highest-fidelity file in this repo, with much higher memory requirements

Example llama.cpp Usage

llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."

Downloads last month: 258

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

16-bit

Model tree for gss1147/Llama-3.2-OctoThinker-iNano-1B-GGUF

Base model

gss1147/Llama-3.2-OctoThinker-iNano-1B

Quantized

(2)

this model