Patrick Hill PRO

pbhappliedsystems

AI & ML interests

PBH Applied Systems publishes evaluated open-weight GGUF models for practical AI deployment, with an emphasis on quantized inference, agentic workflows, structured outputs, tool use, and production reliability. Every model published under this organization is converted, evaluated, and documented by PBH Applied Systems using its proprietary `quant_eval` framework. The evaluation process compares full-precision and quantized variants across agent-adjacent task families including structured JSON output, tool dispatch, multi-turn state retention, mixed natural language plus JSON responses, multiple-choice extraction, fuzz-style constraint adherence, and multi-step planning. These model cards are designed to support deployment decisions, not just model discovery. Each card documents practical behavior, quantization trade-offs, failure modes, recommended use cases, hardware requirements, and guardrails for production use. Try the live PBH Applied Systems AI Agent Demo: https://pbhappliedsystems.com/assistant.html The demo lets visitors interact with evaluated quantized open-weight models across reasoning, document intelligence, and code automation workflows running on private GPU infrastructure.

Recent Activity

updated a model 2 days ago

pbhappliedsystems/Mistral_Nemo_Instruct_2407_4bit_AWQ_Async

updated a model 2 days ago

pbhappliedsystems/Qwen2.5-32B-Instruct-bnb-4bit

published a model 2 days ago

pbhappliedsystems/Mistral_Nemo_Instruct_2407_4bit_AWQ_Async

View all activity

Organizations

None yet

updated 2 models 2 days ago

pbhappliedsystems/Mistral_Nemo_Instruct_2407_4bit_AWQ_Async

3B • Updated 2 days ago • 19

pbhappliedsystems/Qwen2.5-32B-Instruct-bnb-4bit

Text Generation • 34B • Updated 2 days ago • 26

published 2 models 2 days ago

pbhappliedsystems/Mistral_Nemo_Instruct_2407_4bit_AWQ_Async

3B • Updated 2 days ago • 19

pbhappliedsystems/Qwen2.5-32B-Instruct-bnb-4bit

Text Generation • 34B • Updated 2 days ago • 26

updated a dataset 5 days ago

pbhappliedsystems/syntheval-cloud-regulated-deid-1k

Viewer • Updated 5 days ago • 1.15k • 53

published a dataset 5 days ago

pbhappliedsystems/syntheval-cloud-regulated-deid-1k

Viewer • Updated 5 days ago • 1.15k • 53

liked a model 14 days ago

zai-org/GLM-5.2

Text Generation • 753B • Updated 9 days ago • 160k • • 3.17k

posted an update 24 days ago

Post

2312

🚀 **New flagship dataset — and an argument about what a dataset card should be.**

Most synthetic datasets on the Hub ship row counts, a license, and little else — pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite.

**SynthEval Cloud — Regulated-Domain Synthetic Instruction Dataset**
👉 pbhappliedsystems/syntheval-cloud-regulated-instruct-1k

**1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check:

- 🧪 **Dual-signal hallucination gate** — rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects.
- 🔒 **Layered PII masking + independent leak audit** — a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records.
- 📊 **Whole-corpus evaluation, not a sample** — MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield.
- 🧾 **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work.

Every number on the card is a field in the evaluation_report.json shipped beside the data — full methodology + provenance (Mistral-Nemo AWQ W4A16 · vLLM 0.8.5.post1 · Modal A10G).

One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates.

📄 Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf
🔎 Overview: https://pbhappliedsystems.com/synthetic-data.html

**CC BY 4.0** — commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk.

— Patrick Hill, PBH Applied Systems

updated a dataset 24 days ago

pbhappliedsystems/syntheval-cloud-regulated-instruct-1k

Viewer • Updated 24 days ago • 36 • 72

published a dataset 24 days ago

pbhappliedsystems/syntheval-cloud-regulated-instruct-1k

Viewer • Updated 24 days ago • 36 • 72

posted an update about 2 months ago

Post

204

## quant-eval Agent Arena — Now Live

After several months of building, the quant-eval Agent Arena is live: pbhappliedsystems/quant-eval-agent-arena

**What it is:** A side-by-side ReAct agent comparison platform running 9 independently evaluated GGUF models. Select any two models, pick an agent template, submit a query, and watch both agents reason through it in real time — with quant_eval v7.21 behavioral scores displayed alongside every response.

**Three agent templates:**
- 〔R〕 Reasoning & Analysis
- 〔D〕 Document Intelligence
- 〔C〕 Code & Automation

**The models (all Q4_K_M GGUF):**
- Qwen2.5-3B / 7B / 14B-Instruct-1M / 32B
- Ministral-3-14B-Instruct-2512
- Ministral-3-14B-Reasoning-2512
- Phi-4-reasoning-plus
- Mistral-Nemo-Instruct-2407
- Qwen3.6-27B

**What quant_eval v7.21 measures:** 42 fixture cases across 8 task families — json_multistep, stateful_followup, toolcall_only, mixed_brief_json, toolcall, json, fuzz, mcq. Every model evaluated at both F16 and Q4_K_M precision where hardware permits. The delta is the quantization impact report.

**Stack:** Gradio + llama-cpp-python (GGUF, CUDA) + custom lightweight ReAct loop + ZeroGPU (H200)

All 18 model cards with full evaluation data are published at: @pbhappliedsystems

Feedback welcome — especially from anyone running evaluations on open-weight quantized models. This is the public-facing surface of a consulting and evaluation practice; the full agent demo is at https://pbhappliedsystems.com/assistant.html