Safetensors
English
Chinese
qwen3_5_moe
qwen3.5
Mixture of Experts
reasoning
distillation
claude-opus
qlora
unsloth

Qwopus MoE 35B-A3B — Claude Opus 4.6 Reasoning Distilled

QLoRA fine-tune of Qwen3.5-35B-A3B (MoE, 3B active parameters) with Claude Opus 4.6 reasoning distillation. Training recipe adapted from Jackrong's Qwopus3.5-27B-v3.

This is the full BF16 safetensors model. For GGUF quantizations (Q4, Q5, Q6, Q8), see samuelcardillo/Qwopus-MoE-35B-A3B-GGUF.

Credits

This model is based on the work of Jackrong and his Qwopus3.5-27B-v3 training methodology — same datasets, same philosophy, adapted for the MoE architecture. See his complete training guide.

Model Details

Property Value
Base Model Qwen/Qwen3.5-35B-A3B
Architecture Mixture of Experts (MoE)
Total Parameters ~35B
Active Parameters ~3B per token
Precision BF16

Training Details

Parameter Value
Method QLoRA (4-bit base + LoRA in BF16)
Framework Unsloth 2026.4.2 + TRL
LoRA Rank 32
LoRA Alpha 32
LoRA Targets q_proj, k_proj, v_proj, o_proj
Trainable Parameters 6,881,280 (0.02%)
Epochs 2
Final Loss 0.5517
GPU NVIDIA RTX PRO 6000 Blackwell (96GB)
Training Time ~3.5 hours

Datasets (3,209 examples)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "samuelcardillo/Qwopus-MoE-35B-A3B",
    torch_dtype="bfloat16",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "samuelcardillo/Qwopus-MoE-35B-A3B",
    trust_remote_code=True,
)

Acknowledgements

  • Jackrong — Training methodology and the Qwopus concept
  • Unsloth — QLoRA training framework
  • Qwen — Base model
Downloads last month
138
Safetensors
Model size
36B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for samuelcardillo/Qwopus-MoE-35B-A3B

Finetuned
(74)
this model
Quantizations
3 models

Datasets used to train samuelcardillo/Qwopus-MoE-35B-A3B