Llama-3.2-OctoThinker-iNano-1B-GGUF

Model Summary

Llama-3.2-OctoThinker-iNano-1B-GGUF is the GGUF quantized release of the main model:

Main model repo:
https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B

This repository packages the model for efficient local inference in GGUF-compatible runtimes such as llama.cpp, LM Studio, and similar local tools.

GGUF to Main Model Link

This GGUF repository corresponds to the main model repo:

gss1147/Llama-3.2-OctoThinker-iNano-1B

If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above.

Available Files

This GGUF repository currently includes:

  • Q4_K_M โ€” 955 MB
  • Q5_K_M โ€” 1.09 GB
  • F16 โ€” 3 GB

Architecture

  • Architecture: llama
  • Model size: 1B params

Intended Use

This model is intended for:

  • local text generation
  • assistant-style prompting
  • lightweight reasoning tasks
  • summarization
  • simple coding help
  • offline/local inference workflows

Quantization Notes

Choose the file that best matches your hardware:

  • Q4_K_M for smaller size and lighter RAM usage
  • Q5_K_M for a stronger quality-to-size balance
  • F16 for the highest-fidelity file in this repo, with much higher memory requirements

Example llama.cpp Usage

llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."
Downloads last month
258
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gss1147/Llama-3.2-OctoThinker-iNano-1B-GGUF

Quantized
(2)
this model