Qwen3.6-35B-A3B — MLX 4.4 BPW

Mixed-precision MLX quantization of Qwen/Qwen3.6-35B-A3B, quantized with MLX Smart Quantize (MSQ) — my own sensitivity-based mixed-precision quantization method for Apple Silicon. It measures per-layer NMSE and assigns optimal bit widths automatically, combining architecture knowledge with measured data.

Details

  • Type: Vision (VLM)
  • Average: 4.39 bits per weight
  • Method: MLX Smart Quantize (MSQ)
  • AWQ scaling: applied to 50 groups

Evaluation

Benchmark Score Samples
MMLU 81.8% 285
HellaSwag 91.5% 200
GSM8K 87% 200
Downloads last month
69
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support