Instructions to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF",
	filename="IBM-Agentic-Nvidia-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Ollama:
```
ollama run hf.co/WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
```

Unsloth Studio

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF to start chatting

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M
```

Lemonade

How to use WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Nvidia.Agentic.Coder-4B-GGUF-Q4_K_M

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nvidia.Agentic.Coder-4B-GGUF

📌 Model Overview

Model Name: WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF Organization: Within Us AI Model Type: Code LLM (Agentic, Instruction-Following) Parameter Size: 4B Format: GGUF (quantized for local inference) Primary Use: Agentic coding, tool-using workflows, software engineering reasoning

This model is part of the Within Us AI ecosystem focused on building agentic, reasoning-driven coding systems designed to think, act, and verify like real engineers.

⸻

🧬 Architecture & Lineage

Base Family: NVIDIA Nemotron-style 4B class models (inferred lineage from naming + ecosystem alignment)
Format Conversion: GGUF quantization for efficient local inference
Training Approach:
- Instruction-tuned for coding tasks
- Agentic workflow emphasis (multi-step reasoning, tool usage)
- Likely merged / fine-tuned using Within Us AI proprietary pipelines

Related ecosystem models include:

NVIDIA-Nemotron-3-Nano-4B
Other 4B agentic coders and merges in the same class

⸻

⚙️ Key Capabilities

🧑‍💻 Code Intelligence

Multi-language code generation
Bug fixing and refactoring
Structured output generation

🤖 Agentic Behavior

Step-by-step reasoning
Task decomposition
Tool-calling alignment (design goal)

🧠 Reasoning Focus

Instruction-following with logical chaining
Designed for evaluation-style datasets (tests-as-truth philosophy)

⸻

📦 GGUF Quantization

GGUF allows efficient local inference with tools like:

llama.cpp
LM Studio
Ollama (GGUF-compatible builds)

Typical quantizations for 4B GGUF models include:

Q2_K (~1.8GB)
Q3_K (~2.0–2.3GB)
Q4_K (~2.5GB, recommended balance)

⸻

🚀 Intended Use

✅ Ideal Use Cases

Local AI coding assistants
Autonomous coding agents
SWE-bench style evaluation
Tool-augmented workflows
Offline developer copilots

⚠️ Limitations

Smaller 4B parameter size limits deep reasoning vs larger models
Performance depends heavily on prompt structure
Tool-use requires external orchestration (not built-in runtime)

⸻

🛠️ Usage Example (llama.cpp)

./main -m Nvidia.Agentic.Coder-4B.Q4_K.gguf
-p "Write a Python function to parse JSON logs and extract errors."
-n 512

⸻

🧪 Training Philosophy (Within Us AI)

Within Us AI focuses on:

Agentic AI systems
Test-driven training (tests-as-truth)
Diff-first patching workflows
Secure and auditable code generation
Evaluation-first development pipelines

⸻

📊 Evaluation

No formal benchmark results published yet.

Expected strengths:

Strong instruction adherence
Lightweight agentic reasoning
Efficient local deployment

⸻

📚 Datasets & Training Sources

This model follows the Within Us AI methodology:

Proprietary datasets created by Within Us AI
May include third-party datasets for training (no ownership claimed)
Emphasis on:
- Code reasoning traces
- Agentic workflows
- Evaluation-driven samples

⸻

📜 License

License Type: Custom / Other (Within Us AI License)

Terms:

Within Us AI created the fine-tuning, merging, and training methodology
Base model architecture originates from third-party LLM ecosystems (e.g., NVIDIA / Nemotron class)
Third-party datasets may be used without claiming ownership
Full credit and acknowledgment belong to original dataset and base model creators

⸻

🙏 Acknowledgements

Special thanks to:

NVIDIA Nemotron ecosystem contributors
Open-source GGUF tooling community
Dataset creators across Hugging Face
The broader open-source AI research community

⸻

🔗 Links

Model: https://huggingface.co/WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF
Organization: https://huggingface.co/WithinUsAI

Downloads last month: 547

GGUF

Model size

4B params

Architecture

nemotron_h

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF

Collection including WithinUsAI/Nvidia.Agentic.Coder-4B-GGUF

WithIn US AI (((GGUF MODELS)))

Collection

LLM MODELS TRAINED, FINE-TUNED, MERGED and Refusal Removal BY (WITHIN US AI) • 24 items • Updated 2 days ago • 7