You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-35B-Optimized-HauhauCS

Join the Discord for updates, roadmaps, projects, or just to chat.

Optimized Qwen3.5-35B-A3B by HauhauCS.

Access

This is currently a Closed Beta release designed to lower (V)RAM requirements by up to 50% without sacrificing real world capabilities.

Downloads

File	Type	Size
Qwen3.5-35B-Optimized-HauhauCS-40-Q8_K_P.gguf	Q8_K_P	25 GB
Qwen3.5-35B-Optimized-HauhauCS-40-Q4_K_P.gguf	Q4_K_P	14 GB
mmproj-Qwen3.5-35B-Optimized-HauhauCS-40-f16.gguf	mmproj (F16)	858 MB

What are K_P quants?

K_P quants use model-specific importance analysis to selectively preserve quality where it matters most. Fully compatible with llama.cpp, LM Studio, and any GGUF runtime.

Specs

35B-A3B MoE (35B total, ~3B active per forward pass)
262K context
Multimodal (vision support via mmproj)
Based on Qwen3.5-35B-A3B

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, etc.

llama-cli -m Qwen3.5-35B-Optimized-HauhauCS-40-Q8_K_P.gguf --mmproj mmproj-Qwen3.5-35B-Optimized-HauhauCS-40-f16.gguf -ngl 99

Note: K_P quants may show as "?" in LM Studio's quant column — display issue only, loads and runs fine.

Downloads last month: 11

GGUF

Model size

22B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support