Outlier-Compact-14B-MLX-4bit

💬 Outlier-optimized MLX 4-bit build of Qwen2.5-14B-Instruct by Qwen team at Alibaba.

This is a format conversion. Model weights are unchanged from the original Qwen team at Alibaba release — all credit for capability goes to them. Outlier adds the MLX 4-bit packaging so it runs fast and clean on Apple Silicon, bundled in the Outlier desktop app's Chat tier.

Quickstart

In the Outlier desktop app (recommended)

Download outlier.host (9 MB installer for Mac)
Open the model picker → Chat tier → Outlier-Compact-14B-MLX-4bit
Chat

Standalone via `mlx_lm`

pip install mlx_lm

from mlx_lm import load, generate

model, tokenizer = load("Outlier-Ai/Outlier-Compact-14B-MLX-4bit")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What's the capital of France?"}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt, max_tokens=200))

What's different from the original

Literally only:

Format: MLX 4-bit (group-size 128, awq-style quantization) instead of FP16/BF16
Size on disk: ~4-8× smaller than the original
Packaging: Outlier-branded for first-class pickup in the Outlier desktop app

Everything else — tokenizer, chat template, special tokens, architecture, training, alignment — is Qwen team at Alibaba's work, unchanged.

Hardware

Apple Silicon only (M1 / M2 / M3 / M4 / M-Ultra). Uses Apple MLX framework directly. Intel Macs: not supported. For cross-platform, see the GGUF siblings if available.

Attribution (legally + ethically required)

Model: Qwen team at Alibaba (see their HF page) Upstream weights: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct Original license: Apache 2.0 — unchanged on this repo Format conversion by: Outlier-Ai, 2026-04-19

Please honor Qwen team at Alibaba's license terms. This repo inherits them exactly.

What is Outlier?

Outlier is a Mac-native, offline-first AI platform. One 9 MB installer, curated tiers grouped by use case (🧠 Think / 💻 Code / 💬 Chat / ⚡ Fast / 🧪 Experiment), no per-token costs, no cloud round-trip, no rate limits.

Desktop app: https://outlier.host
All Outlier-optimized builds: https://huggingface.co/Outlier-Ai
GitHub: https://github.com/Outlier-host/Outlier

Known limits

MMLU on this 4-bit checkpoint is not independently measured by Outlier. Upstream Qwen team at Alibaba published benchmarks apply to the FP base, not this quantization. When benchmarks are published they'll be on the upstream model card, not this one.
Quantization from FP16 to 4-bit typically costs 1-3% MMLU. If you need maximum quality, use the FP16 upstream release. Outlier optimizes for the Mac-RAM-budget axis, not maximum quality.
This is a brand-new conversion (not yet download-proven); bench + RAM numbers land once the Outlier bench harness runs it.

Rollback / rejection

If anything looks wrong: the original Qwen team at Alibaba release at https://huggingface.co/Qwen/Qwen2.5-14B-Instruct is the authoritative source. This repo is a convenience wrapper.

License

Apache 2.0 — inherited from the upstream Qwen team at Alibaba release without modification.

Citation

Credit the upstream team for the model, not Outlier for the packaging. If the upstream model card has a BibTeX entry, use that. Outlier does not attach its own citation to format conversions.

Receipt

{
  "conversion_date": "2026-04-19",
  "source_repo": "Qwen/Qwen2.5-14B-Instruct",
  "source_license": "apache-2.0",
  "target_repo": "Outlier-Ai/Outlier-Compact-14B-MLX-4bit",
  "format": "mlx 4-bit awq (group-size 128)",
  "converter": "Outlier-Ai autopilot (mega_infinite sprint)"
}

Downloads last month: 50

Safetensors

Model size

15B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for Outlier-Ai/Outlier-Compact-14B-MLX-4bit

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Quantized

(133)

this model

Collection including Outlier-Ai/Outlier-Compact-14B-MLX-4bit

Outlier Consumer Edition

Collection

Mac-native MLX and GGUF quantized Outlier models. • 6 items • Updated 2 days ago