Qwen3.6-27B-OptiQ-4bit

Optimized for Apple Silicon with mlx-optiq — sensitivity-aware mixed-precision quantization, reusable at inference, fine-tuning, and serving time.

This is a mixed-precision quantized version of Qwen/Qwen3.6-27B in MLX format. Instead of uniform 4-bit across every layer, optiq measures each layer's sensitivity via KL divergence on calibration data and assigns per-layer bit-widths (some layers at 8-bit, the rest at 4-bit) at the same average bits-per-weight. Same size, higher quality.

The optiq_metadata.json sidecar ships in the repo; it's what mlx-optiq reads to drive sensitivity-aware LoRA fine-tuning, mixed-precision KV serving, and hot-swap adapter routing.

Multimodal stripping (default): the base is published as a multimodal foundation model, but this optiq release ships the language stack only. Vision/audio config + token metadata are dropped at conversion time so the artifact is smaller, leaves more RAM for KV cache + LoRA, and runs cleanly on lower-spec Apple Silicon with longer contexts. If you need the vision tower, re-convert from the base with:

optiq convert Qwen/Qwen3.6-27B --target-bpw 4.5 --keep-unused-modalities -o ./mmodel

Quantization Details

Property	Value
Target BPW	4.5
Achieved BPW	4.50
Layers at 8-bit (sensitive)	247
Layers at 4-bit (robust)	343
Total quantized layers	590
Group size	64
`model_type`	`qwen3_5`

Usage

Basic (works with stock `mlx-lm`)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Qwen3.6-27B-OptiQ-4bit")
response = generate(
    model, tokenizer,
    prompt="Explain quantum computing in simple terms.",
    max_tokens=200,
)
print(response)

Unlock the full stack with `mlx-optiq`

Installing mlx-optiq turns this model from a static checkpoint into a deployment-ready base:

pip install mlx-optiq

Mixed-precision KV-cache serving (+40–62% decode speedup at 64k context on Qwen3.5 2B/4B/9B vs fp16 KV on M3 Max):

# One-time per-layer KV sensitivity pass
optiq kv-cache mlx-community/Qwen3.6-27B-OptiQ-4bit --target-bits 4.5 -o ./kv_cache

# OpenAI-compatible server on :8080
optiq serve \
    --kv-config ./kv_cache/kv_config.json \
    --model mlx-community/Qwen3.6-27B-OptiQ-4bit \
    --max-tokens 32768 --temp 0.6 --top-p 0.95

Sensitivity-aware LoRA fine-tuning — layers optiq kept at 8-bit (more sensitive) get 2× the adapter rank of layers at 4-bit, at the same base budget:

optiq lora train mlx-community/Qwen3.6-27B-OptiQ-4bit \
    --data ./my_data \
    --rank 8 --rank-scaling by_bits \
    --iters 1000 -o ./my_adapter

Hot-swap adapters — mount N adapters on one base, switch per request without reloading the model (adapter id via HF repo or local path, auto-downloaded):

optiq serve \
    --model mlx-community/Qwen3.6-27B-OptiQ-4bit \
    --adapter ./my_adapter

Full documentation: mlx-optiq.pages.dev

Benchmarks

GSM8K (200 samples, 3-shot chain-of-thought, chat template + thinking disabled):

Model	GSM8K Accuracy
OptiQ mixed (4.5 BPW)	95.0% (190/200)
Uniform 4-bit	94.0% (188/200)

OptiQ delta vs uniform 4-bit: +1.0pp (within 200-sample noise band).

See the Results page for the full per-model lineup.

Credits

Quantization method: mlx-optiq
Base model: Qwen/Qwen3.6-27B
Runtime: MLX

License

Apache 2.0 (inherits from base model).

Downloads last month: 5,919

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for mlx-community/Qwen3.6-27B-OptiQ-4bit

Base model

Qwen/Qwen3.6-27B

Quantized

(242)

this model

mlx-community
/

Qwen3.6-27B-OptiQ-4bit

Qwen3.6-27B-OptiQ-4bit

Quantization Details

Usage

Basic (works with stock `mlx-lm`)

Unlock the full stack with `mlx-optiq`

Benchmarks

Links

Credits

License

Model tree for mlx-community/Qwen3.6-27B-OptiQ-4bit

Qwen3.6-27B-OptiQ-4bit

Quantization Details

Usage

Basic (works with stock mlx-lm)

Unlock the full stack with mlx-optiq

Benchmarks

Links

Credits

License

Model tree for mlx-community/Qwen3.6-27B-OptiQ-4bit

Basic (works with stock `mlx-lm`)

Unlock the full stack with `mlx-optiq`