Instructions to use ertghiu256/Qwen3.5-2b-ReMix with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ertghiu256/Qwen3.5-2b-ReMix with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ertghiu256/Qwen3.5-2b-ReMix") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ertghiu256/Qwen3.5-2b-ReMix") model = AutoModelForImageTextToText.from_pretrained("ertghiu256/Qwen3.5-2b-ReMix") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ertghiu256/Qwen3.5-2b-ReMix with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ertghiu256/Qwen3.5-2b-ReMix" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ertghiu256/Qwen3.5-2b-ReMix", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ertghiu256/Qwen3.5-2b-ReMix
- SGLang
How to use ertghiu256/Qwen3.5-2b-ReMix with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ertghiu256/Qwen3.5-2b-ReMix" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ertghiu256/Qwen3.5-2b-ReMix", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ertghiu256/Qwen3.5-2b-ReMix" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ertghiu256/Qwen3.5-2b-ReMix", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio new
How to use ertghiu256/Qwen3.5-2b-ReMix with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ertghiu256/Qwen3.5-2b-ReMix to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ertghiu256/Qwen3.5-2b-ReMix to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ertghiu256/Qwen3.5-2b-ReMix to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ertghiu256/Qwen3.5-2b-ReMix", max_seq_length=2048, ) - Docker Model Runner
How to use ertghiu256/Qwen3.5-2b-ReMix with Docker Model Runner:
docker model run hf.co/ertghiu256/Qwen3.5-2b-ReMix
🚀 Qwen3.5-2B-ReMix (Reasoning Mix)
This repository contains a fully merged, native Float16 (F16) fine-tune of Qwen/Qwen3.5-2B 🤖. The primary objective of this model is to significantly scale up performance on complex reasoning tasks, specifically targeting advanced mathematics 🧮, logical deduction, and structured coding problems 💻.
By leveraging open-source distillation data, it aims to achieve high-tier, "frontier-style" reasoning capabilities while keeping the footprint compact enough to run smoothly at native speeds on local, everyday consumer hardware 🏠 without the need to load external adapters.
🌟 Model Highlights
- 🏗️ Base Architecture: Qwen/Qwen3.5-2B (Dense, Hybrid Gated DeltaNet)
- 💾 Precision format: Native Float16 (F16) Merged Weights — No adapter required!
- 🎯 Main Goal: Advanced mathematical reasoning and complex code generation/debugging.
- 🛡️ Data Origin: 100% open-source distilled reasoning datasets natively hosted on Hugging Face. No proprietary data or closed APIs (OpenAI, Anthropic, Google) were used or involved in the collection or training process.
- ⚡ Target Environment: Local, high-efficiency edge execution with minimal hardware requirements.
🎛️ Recommended Generation Parameters
To unlock the best reasoning patterns and prevent the model from drifting into creative fluff, it is highly recommended to override the default sampler settings with the following values during local inference:
| Parameter | Value | Note |
|---|---|---|
🌡️ Temperature (temp) |
0.4 |
Keeps logical thoughts focused and mathematically stable. |
🎯 Top P (top_p) |
30.0 |
Expands token exploration for rich code structures. |
💡 Prompting Note: Because this is built on top of the Qwen3.5 Small architecture, make sure your UI environment or inference wrapper passes parameters that allow the system to natively isolate and render its internal chain-of-thought steps.
📊 Training & Merge Details
The model was adapted using Parameter-Efficient Fine-Tuning (PEFT) and then compiled back into the core network layers to output clean, unified F16 weights via Unsloth.
- 🔄 Training Steps: 175
- 📉 Loss Profile: Convergence floor reached ~0.58; stabilized consistently around 0.85
- 📈 Learning Rate:
4e-5 - 📐 LoRA Rank ($R$) during training:
16 - ⚖️ LoRA Alpha ($\alpha$) during training:
32
⚠️ Limitations & Risks
While this fine-tune aggressively pushes the boundaries of what a 2B parameter model can achieve locally, users should carefully account for the following behaviors before deployment:
- 🔮 Hallucinations: Like all highly compact language models, the model can still confidently present false calculations or logically flawed code snippets as absolute facts. Always verify output strings.
- 🎭 Inconsistent Styles: Because the underlying training data aggregates multiple distinct open-source distilled reasoning sets, the model may occasionally exhibit shifting output structures, stylistic variations, or unpredictable pacing across sequential prompts.
- 🛑 Logic Mismatches: For highly advanced mathematical proofs or incredibly niche programming languages, the model may occasionally produce broken syntax or reverse its logical assertions.
📦 How to Use Natively
🐍 Using Hugging Face Transformers
Because this is a standalone model with the weights baked in, you load it directly without any PEFT wrapper boilerplate:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "YOUR_USERNAME/YOUR_REPO_NAME"
# Load the aligned tokenizer and model weights directly
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16, # Native F16 weight format
device_map="auto"
)
messages = [
{"role": "user", "content": "Write a Python script to calculate the exact nth Fibonacci number using matrix exponentiation."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
temperature=0.5,
top_p=30.0,
repeat_penalty=1.2,
)
# Uploaded finetuned model
- **Developed by:** ertghiu256
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Qwen3.5-2B
This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
- Downloads last month
- 89