Outlier-Compact-14B-MLX-4bit

💬 Outlier-optimized MLX 4-bit build of Qwen2.5-14B-Instruct by Qwen team at Alibaba.

This is a format conversion. Model weights are unchanged from the original Qwen team at Alibaba release — all credit for capability goes to them. Outlier adds the MLX 4-bit packaging so it runs fast and clean on Apple Silicon, bundled in the Outlier desktop app's Chat tier.

Quickstart

In the Outlier desktop app (recommended)

  1. Download outlier.host (9 MB installer for Mac)
  2. Open the model picker → Chat tier → Outlier-Compact-14B-MLX-4bit
  3. Chat

Standalone via mlx_lm

pip install mlx_lm
from mlx_lm import load, generate

model, tokenizer = load("Outlier-Ai/Outlier-Compact-14B-MLX-4bit")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What's the capital of France?"}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt, max_tokens=200))

What's different from the original

Literally only:

  • Format: MLX 4-bit (group-size 128, awq-style quantization) instead of FP16/BF16
  • Size on disk: ~4-8× smaller than the original
  • Packaging: Outlier-branded for first-class pickup in the Outlier desktop app

Everything else — tokenizer, chat template, special tokens, architecture, training, alignment — is Qwen team at Alibaba's work, unchanged.

Hardware

Apple Silicon only (M1 / M2 / M3 / M4 / M-Ultra). Uses Apple MLX framework directly. Intel Macs: not supported. For cross-platform, see the GGUF siblings if available.

Attribution (legally + ethically required)

Model: Qwen team at Alibaba (see their HF page) Upstream weights: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct Original license: Apache 2.0 — unchanged on this repo Format conversion by: Outlier-Ai, 2026-04-19

Please honor Qwen team at Alibaba's license terms. This repo inherits them exactly.

What is Outlier?

Outlier is a Mac-native, offline-first AI platform. One 9 MB installer, curated tiers grouped by use case (🧠 Think / 💻 Code / 💬 Chat / ⚡ Fast / 🧪 Experiment), no per-token costs, no cloud round-trip, no rate limits.

Known limits

  • MMLU on this 4-bit checkpoint is not independently measured by Outlier. Upstream Qwen team at Alibaba published benchmarks apply to the FP base, not this quantization. When benchmarks are published they'll be on the upstream model card, not this one.
  • Quantization from FP16 to 4-bit typically costs 1-3% MMLU. If you need maximum quality, use the FP16 upstream release. Outlier optimizes for the Mac-RAM-budget axis, not maximum quality.
  • This is a brand-new conversion (not yet download-proven); bench + RAM numbers land once the Outlier bench harness runs it.

Rollback / rejection

If anything looks wrong: the original Qwen team at Alibaba release at https://huggingface.co/Qwen/Qwen2.5-14B-Instruct is the authoritative source. This repo is a convenience wrapper.

License

Apache 2.0 — inherited from the upstream Qwen team at Alibaba release without modification.

Citation

Credit the upstream team for the model, not Outlier for the packaging. If the upstream model card has a BibTeX entry, use that. Outlier does not attach its own citation to format conversions.

Receipt

{
  "conversion_date": "2026-04-19",
  "source_repo": "Qwen/Qwen2.5-14B-Instruct",
  "source_license": "apache-2.0",
  "target_repo": "Outlier-Ai/Outlier-Compact-14B-MLX-4bit",
  "format": "mlx 4-bit awq (group-size 128)",
  "converter": "Outlier-Ai autopilot (mega_infinite sprint)"
}
Downloads last month
50
Safetensors
Model size
15B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-Compact-14B-MLX-4bit

Base model

Qwen/Qwen2.5-14B
Quantized
(133)
this model

Collection including Outlier-Ai/Outlier-Compact-14B-MLX-4bit