Darwin-35B-A3B-Opus

Quality: quantized (mixed quants per tensor, group size: 32, 9.191 bpw)

Most layers use 8-bit affine quantization with a group size 32, embeddings and some other layers are saved in bf16.

Model Specifications


Architecture	Qwen3.5 MoE (Gated DeltaNet + MoE)
Total Parameters	35B
Active Parameters	3B per forward pass
Layers	40
Layout	10 x (3 x GDN-MoE + 1 x Attention-MoE)
Experts	256 (8 routed + 1 shared active)
Context Length	262,144 native
Languages	201
Multimodal	Image and Video
License	Apache 2.0

Parent Models

Both parents share the identical Qwen3.5-35B-A3B architecture (40 layers, 256 experts, GDN+MoE hybrid). The Mother is a LoRA SFT on the same base — not a different architecture. "Text-only" refers to the training data (Claude 4.6 Opus reasoning chains), not the model structure.

Role	Model	Architecture	Training
Father	Qwen/Qwen3.5-35B-A3B	Qwen3.5-35B-A3B	Original pre-training + RLHF
Mother	Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled	Qwen3.5-35B-A3B (same)	LoRA SFT with text-only Claude reasoning chains

Source

This model was converted to MLX format from FINAL-Bench/Darwin-35B-A3B-Opus using mlx-vlm version 0.4.4.

Downloads last month: 147

Safetensors

Model size

35B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for TheCluster/Darwin-35B-A3B-Opus-MLX-mixed-9bit

Base model

FINAL-Bench/Darwin-35B-A3B-Opus

Quantized

(11)

this model

Collection including TheCluster/Darwin-35B-A3B-Opus-MLX-mixed-9bit

Fine-tuned Qwen3.5 MLX

Collection

26 items • Updated Apr 16 • 5