Qwen3.6-35B-A3B_INT4 โ€” PTQ Quantized (W4A16)

Overview

This repository provides a Post-Training Quantized (PTQ) version of:

Base Model: Qwen/Qwen3.6-35B-A3B
Quantized By: TheHouseOfTheDude

This is a true PTQ quantization:

  • No calibration dataset
  • One-shot quantization
  • Fast and deterministic pipeline

Quantization Details

  • Scheme: W4A16
  • Weights: INT4 (per-channel symmetric)
  • Activations: FP16 / BF16
  • Method: llmcompressor.oneshot
  • Targets: Linear layers only

Ignored layers:

  • lm_head
  • visual modules
  • linear_attn
  • mtp
  • mlp.gate
  • mlp.shared_expert_gate

KLD Results

Mean KLD: 0.071393

low divergence โ†’ strong fidelity to original model. (0.01 Better than a IQ3_S)


Implementation Notes

  • Uses AutoModelForImageTextToText for correct vLLM weight paths
  • Includes Transformers v5 key remapping fix
  • No calibration dataset used

Usage (vLLM)

pip install -U vllm

vllm serve TheHouseOfTheDude/Qwen3.6-35B-A3B_INT4 \
  --quantization compressed-tensors \
  --tensor-parallel-size 4 \
  --dtype bfloat16

Notes

  • Requires compressed-tensors runtime
  • Not compatible with vanilla Transformers loading
  • Optimized for production inference

Credits

  • Qwen/Qwen3.6-35B-A3B
  • TheHouseOfTheDude
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TheHouseOfTheDude/Qwen3.6-35B-A3B_INT4

Quantized
(430)
this model