Qwen3.6-35B-A3B_INT4 — PTQ Quantized (W4A16)

Overview

This repository provides a Post-Training Quantized (PTQ) version of:

Base Model: Qwen/Qwen3.6-35B-A3B
Quantized By: TheHouseOfTheDude

This is a true PTQ quantization:

No calibration dataset
One-shot quantization
Fast and deterministic pipeline

Quantization Details

Scheme: W4A16
Weights: INT4 (per-channel symmetric)
Activations: FP16 / BF16
Method: llmcompressor.oneshot
Targets: Linear layers only

Ignored layers:

lm_head
visual modules
linear_attn
mtp
mlp.gate
mlp.shared_expert_gate

KLD Results

Mean KLD: 0.071393

low divergence → strong fidelity to original model. (0.01 Better than a IQ3_S)

Implementation Notes

Uses AutoModelForImageTextToText for correct vLLM weight paths
Includes Transformers v5 key remapping fix
No calibration dataset used

Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.6-35B-A3B_INT4 \
  --quantization compressed-tensors \
  --tensor-parallel-size 4 \
  --dtype bfloat16

Notes

Requires compressed-tensors runtime
Not compatible with vanilla Transformers loading
Optimized for production inference

Credits

Qwen/Qwen3.6-35B-A3B
TheHouseOfTheDude

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TheHouseOfTheDude/Qwen3.6-35B-A3B_INT4

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(430)

this model