Llama-3.2-OctoThinker-iNano-1B-GGUF
Model Summary
Llama-3.2-OctoThinker-iNano-1B-GGUF is the GGUF quantized release of the main model:
Main model repo:
https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B
This repository packages the model for efficient local inference in GGUF-compatible runtimes such as llama.cpp, LM Studio, and similar local tools.
GGUF to Main Model Link
This GGUF repository corresponds to the main model repo:
gss1147/Llama-3.2-OctoThinker-iNano-1B
If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above.
Available Files
This GGUF repository currently includes:
- Q4_K_M โ 955 MB
- Q5_K_M โ 1.09 GB
- F16 โ 3 GB
Architecture
- Architecture: llama
- Model size: 1B params
Intended Use
This model is intended for:
- local text generation
- assistant-style prompting
- lightweight reasoning tasks
- summarization
- simple coding help
- offline/local inference workflows
Quantization Notes
Choose the file that best matches your hardware:
- Q4_K_M for smaller size and lighter RAM usage
- Q5_K_M for a stronger quality-to-size balance
- F16 for the highest-fidelity file in this repo, with much higher memory requirements
Example llama.cpp Usage
llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."
- Downloads last month
- 258
Hardware compatibility
Log In to add your hardware
4-bit
5-bit
16-bit
Model tree for gss1147/Llama-3.2-OctoThinker-iNano-1B-GGUF
Base model
gss1147/Llama-3.2-OctoThinker-iNano-1B