Instructions to use Kassadin88/Qwen3.5-122B-A10B-Claude-distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kassadin88/Qwen3.5-122B-A10B-Claude-distill with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Kassadin88/Qwen3.5-122B-A10B-Claude-distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Kassadin88/Qwen3.5-122B-A10B-Claude-distill")
model = AutoModelForImageTextToText.from_pretrained("Kassadin88/Qwen3.5-122B-A10B-Claude-distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Kassadin88/Qwen3.5-122B-A10B-Claude-distill with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kassadin88/Qwen3.5-122B-A10B-Claude-distill"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Qwen3.5-122B-A10B-Claude-distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Kassadin88/Qwen3.5-122B-A10B-Claude-distill

SGLang

How to use Kassadin88/Qwen3.5-122B-A10B-Claude-distill with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kassadin88/Qwen3.5-122B-A10B-Claude-distill" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Qwen3.5-122B-A10B-Claude-distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kassadin88/Qwen3.5-122B-A10B-Claude-distill" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kassadin88/Qwen3.5-122B-A10B-Claude-distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Kassadin88/Qwen3.5-122B-A10B-Claude-distill with Docker Model Runner:
```
docker model run hf.co/Kassadin88/Qwen3.5-122B-A10B-Claude-distill
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3.5-122B-A10B Claude-Distill

A fine-tuned version of Qwen/Qwen3.5-122B-A10B through knowledge distillation from Claude. This model is trained with full parameter fine-tuning on curated Claude reasoning traces.

Model Highlights

Claude-Distilled Reasoning: Trained on high-quality chain-of-thought reasoning traces distilled from Claude Opus
Multi-Domain Coverage: Math, logic, coding, creative writing, STEM, and multi-turn reasoning
Mixture-of-Experts Architecture: Based on Qwen/Qwen3.5-122B-A10B with 122B total / 10B active parameters with Mixture-of-Experts architecture (10B active parameters per token)
Multimodal Capable: Inherits vision-language capabilities from Qwen3.5

Model Description

Property	Value
Base Model	Qwen/Qwen3.5-122B-A10B
Model Type	Causal Language Model with Vision Encoder (MoE)
Parameters	122B (10B active)
Languages	English, Chinese
License	Apache 2.0
Developer	Kassadin88

Training Data

Distilled from Claude on the following datasets:

Dataset	Samples	Description
Claude Opus 4.5 High Reasoning	250	High reasoning depth samples
Claude Opus 4.6 Reasoning	9,633	Math, logic puzzles, multi-step instructions with CoT
Claude Opus 4.6 High Reasoning	757	Coding and creative writing with adaptive reasoning
Claude Opus 4.6 Extended Reasoning	500	Extended reasoning across STEM and practical domains
Claude Opus 4.6 Extended Reasoning 887x	887	Tool calling, bullshit detection, multi-turn traces
Claude Sonnet & Opus 4.6 Reasoning	524	Natural human-written prompts from Reddit & Stack Overflow
Opus 4.6 Reasoning Filtered	2,326	Filtered reasoning traces (refusals removed)

Total: ~14.9K samples

Data Composition

Domain	Percentage	Description
Math & Logic	~40%	Multi-step problem solving with chain-of-thought
Coding	~25%	Code generation, debugging, and algorithm design
STEM	~15%	Science, engineering, and extended reasoning
Creative Writing	~10%	Adaptive reasoning for creative tasks
Multi-turn / Tool Use	~10%	Tool calling, clarification, and dialogue

Benchmark Results

For detailed benchmark results and model architecture, please refer to the original Qwen/Qwen3.5-122B-A10B model card.

Quickstart

For full usage guide, please refer to the original Qwen/Qwen3.5-122B-A10B model card.

Using with vLLM

vllm serve Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tensor-parallel-size 8 \
    --max-model-len 262144 \
    --trust-remote-code \
    --reasoning-parser qwen3

Using with SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-122B-A10B-Claude-distill \
    --port 8000 \
    --tp-size 8 \
    --mem-fraction-static 0.8 \
    --context-length 262144 \
    --reasoning-parser qwen3

Using with Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Kassadin88/Qwen3.5-122B-A10B-Claude-distill"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "Hello, how are you?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Usage Tips

For Reasoning Tasks

messages = [
    {"role": "user", "content": "Solve step by step: What is the sum of all prime numbers less than 100?"}
]
# Model will use chain-of-thought reasoning from Claude distillation

For Coding Tasks

messages = [
    {"role": "user", "content": "Implement a binary search tree with insert, delete, and find operations in Python."}
]
# Model benefits from Claude's coding reasoning traces

Enabling / Disabling Thinking

# Enable thinking mode (recommended for reasoning tasks)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)

# Disable thinking mode (for simple tasks, faster inference)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)

Limitations

This model is distilled from Claude and may inherit biases from the training data
The distillation dataset is relatively small (~14.9K samples), which may limit generalization
Should not be used for medical, legal, or financial advice without verification
The model's reasoning capabilities are constrained by the quality and diversity of the distillation data

Citation

@misc{qwen3.5-122b-a10b-claude-distill,
    author = {Kassadin88},
    title = {Qwen3.5-122B-A10B Claude-Distill: A Claude-Distilled Fine-Tuned Model},
    year = {2026},
    publisher = {HuggingFace},
    url = {https://huggingface.co/Kassadin88/Qwen3.5-122B-A10B-Claude-distill}
}

Acknowledgments

Base Model: Qwen Team for Qwen3.5
Training Data: Various Claude Opus reasoning datasets on HuggingFace
Training Framework: DeepSpeed

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month: 396

Safetensors

Model size

868k params

Tensor type

BF16

Model tree for Kassadin88/Qwen3.5-122B-A10B-Claude-distill

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(44)

this model

Quantizations

3 models