Instructions to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf", filename="Llama3.2-AgentHermes-Coder-3B--Q5_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
Use Docker
docker model run hf.co/WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
- LM Studio
- Jan
- Ollama
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with Ollama:
ollama run hf.co/WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
- Unsloth Studio new
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf to start chatting
- Docker Model Runner
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with Docker Model Runner:
docker model run hf.co/WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
- Lemonade
How to use WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf:Q5_K_M
Run and chat with the model
lemonade run user.Llama3.2-Agent.Hermes.Coder-3B-gguf-Q5_K_M
List all available models
lemonade list
Llama3.2-Agent.Hermes.Coder-3B (GGUF)
📌 Model Overview
Model Name: WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf Organization: Within Us AI Base Model: NousResearch/Hermes-3-Llama-3.2-3B Architecture: LLaMA 3.2 (3B) + Hermes 3 fine-tuning Format: GGUF (quantized for local inference) Primary Focus: Agentic coding + structured reasoning
This model is a Hermes-enhanced LLaMA 3.2 coder, optimized for agent workflows, structured outputs, and high-control instruction following in a compact 3B footprint.
It blends:
- LLaMA 3.2’s strong foundation
- Hermes 3’s alignment + tool-use intelligence
- WithinUsAI’s agentic coding focus
⸻
🧬 Architecture & Lineage
Base Stack
- Foundation: LLaMA 3.2 (3B parameter class)
- Fine-Tune: Hermes 3 (Nous Research)
- Conversion: GGUF via llama.cpp toolchain
Hermes 3 is known for:
- Strong instruction-following
- Multi-turn conversation stability
- Tool-use and function-calling capabilities
- Improved reasoning and controllability 
What WithinUsAI Adds
This variant emphasizes:
- Coding-first behavior
- Agentic task execution
- Structured outputs (JSON, functions, steps)
⸻
🧠 Core Design Philosophy
This model operates like a disciplined junior engineer with a systems mindset 🧩💻
Not just generating code… but thinking in steps, outputs, and actions.
Design Goals:
- High controllability (Hermes-style alignment)
- Strong coding bias
- Agent compatibility
- Efficient local deployment
⸻
⚙️ Key Capabilities
💻 Coding
- Python, JavaScript, C++, and more
- Function generation and refactoring
- Debugging and structured fixes
🤖 Agentic Behavior
- Task decomposition
- Step-by-step execution planning
- Function calling / tool-use readiness
🧠 Reasoning
- Chain-of-thought style outputs
- Logical breakdown of problems
- Instruction precision
📦 Structured Output
- JSON generation
- Schema-following responses
- Deterministic formatting (strong Hermes trait)
⸻
📦 GGUF Format & Deployment
Optimized for local inference and edge environments.
Supported Runtimes:
- llama.cpp
- LM Studio
- Ollama (GGUF-compatible builds)
Typical Quantizations (3B):
Quant Size Notes Q4_K_M ~2.0 GB Best balance Q5_K_M ~2.3 GB Higher quality Q8_0 ~3.4 GB Maximum fidelity
Quantization enables large size reduction while maintaining usable performance, making local deployment practical. 
⸻
🚀 Intended Use
✅ Ideal Use Cases
- Local coding assistants
- Agent frameworks (tool-calling pipelines)
- Structured output systems (JSON APIs)
- Autonomous coding workflows
- Offline developer copilots
⚠️ Limitations
- 3B size limits deep reasoning vs larger models
- Requires good prompt structure for best results
- Tool execution must be handled externally
⸻
🛠️ Usage Example (llama.cpp)
./main -m Llama3.2-Agent.Hermes.Coder-3B.Q4_K_M.gguf
-p "Create a JSON schema and Python validator for user authentication."
-n 512
⸻
🧪 Training & Methodology
Within Us AI pipeline emphasizes:
- Instruction-tuned coding datasets
- Agentic workflow examples
- Structured output training
- Evaluation-driven refinement
Data Sources
- Proprietary Within Us AI datasets
- Third-party datasets (no ownership claimed)
- Focus areas:
- Code reasoning
- Tool usage patterns
- Step-by-step problem solving
⸻
📊 Expected Performance Profile
Capability Strength Coding High Instruction following Very High Structured output Very High Reasoning depth Moderate Efficiency Very High
⸻
📜 License
License Type: LLaMA 3 / Hermes 3 compatible licensing (inherits base restrictions)**
Attribution Notes:
- Base model: Meta (LLaMA 3.2)
- Fine-tune: Nous Research (Hermes 3)
- GGUF + optimization + methodology: Within Us AI
- Third-party datasets used without ownership claims
- Credit belongs to original creators
⸻
🙏 Acknowledgements
- Meta (LLaMA 3 architecture)
- Nous Research (Hermes 3 fine-tuning)
- GGUF / llama.cpp ecosystem
- Open-source AI community
⸻
🔗 Links
- Model: https://huggingface.co/WithinUsAI/Llama3.2-Agent.Hermes.Coder-3B-gguf
- Organization: https://huggingface.co/WithinUsAI
⸻
🧩 Closing Note
This model feels like a precision tool in a small chassis ⚙️
It doesn’t just answer… it organizes, structures, and executes.
- Downloads last month
- 1,282
4-bit
5-bit