🚀 Qwen3.5-2B-ReMix (Reasoning Mix)

This repository contains a fully merged, native Float16 (F16) fine-tune of Qwen/Qwen3.5-2B 🤖. The primary objective of this model is to significantly scale up performance on complex reasoning tasks, specifically targeting advanced mathematics 🧮, logical deduction, and structured coding problems 💻.

By leveraging open-source distillation data, it aims to achieve high-tier, "frontier-style" reasoning capabilities while keeping the footprint compact enough to run smoothly at native speeds on local, everyday consumer hardware 🏠 without the need to load external adapters.


🌟 Model Highlights

  • 🏗️ Base Architecture: Qwen/Qwen3.5-2B (Dense, Hybrid Gated DeltaNet)
  • 💾 Precision format: Native Float16 (F16) Merged Weights — No adapter required!
  • 🎯 Main Goal: Advanced mathematical reasoning and complex code generation/debugging.
  • 🛡️ Data Origin: 100% open-source distilled reasoning datasets natively hosted on Hugging Face. No proprietary data or closed APIs (OpenAI, Anthropic, Google) were used or involved in the collection or training process.
  • ⚡ Target Environment: Local, high-efficiency edge execution with minimal hardware requirements.

🎛️ Recommended Generation Parameters

To unlock the best reasoning patterns and prevent the model from drifting into creative fluff, it is highly recommended to override the default sampler settings with the following values during local inference:

Parameter Value Note
🌡️ Temperature (temp) 0.4 Keeps logical thoughts focused and mathematically stable.
🎯 Top P (top_p) 30.0 Expands token exploration for rich code structures.

💡 Prompting Note: Because this is built on top of the Qwen3.5 Small architecture, make sure your UI environment or inference wrapper passes parameters that allow the system to natively isolate and render its internal chain-of-thought steps.


📊 Training & Merge Details

The model was adapted using Parameter-Efficient Fine-Tuning (PEFT) and then compiled back into the core network layers to output clean, unified F16 weights via Unsloth.

  • 🔄 Training Steps: 175
  • 📉 Loss Profile: Convergence floor reached ~0.58; stabilized consistently around 0.85
  • 📈 Learning Rate: 4e-5
  • 📐 LoRA Rank ($R$) during training: 16
  • ⚖️ LoRA Alpha ($\alpha$) during training: 32

⚠️ Limitations & Risks

While this fine-tune aggressively pushes the boundaries of what a 2B parameter model can achieve locally, users should carefully account for the following behaviors before deployment:

  • 🔮 Hallucinations: Like all highly compact language models, the model can still confidently present false calculations or logically flawed code snippets as absolute facts. Always verify output strings.
  • 🎭 Inconsistent Styles: Because the underlying training data aggregates multiple distinct open-source distilled reasoning sets, the model may occasionally exhibit shifting output structures, stylistic variations, or unpredictable pacing across sequential prompts.
  • 🛑 Logic Mismatches: For highly advanced mathematical proofs or incredibly niche programming languages, the model may occasionally produce broken syntax or reverse its logical assertions.

📦 How to Use Natively

🐍 Using Hugging Face Transformers

Because this is a standalone model with the weights baked in, you load it directly without any PEFT wrapper boilerplate:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "YOUR_USERNAME/YOUR_REPO_NAME"

# Load the aligned tokenizer and model weights directly
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.float16, # Native F16 weight format
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a Python script to calculate the exact nth Fibonacci number using matrix exponentiation."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    temperature=0.5,
    top_p=30.0,
    repeat_penalty=1.2,
)

# Uploaded finetuned model

- **Developed by:** ertghiu256
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Qwen3.5-2B

This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
Downloads last month
89
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ertghiu256/Qwen3.5-2b-ReMix

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(177)
this model
Quantizations
2 models

Datasets used to train ertghiu256/Qwen3.5-2b-ReMix