DecoR: Qwen3-4B Fine-tuned on DSE-Cleaned LIMO

This model is Qwen3-4B fine-tuned on the DSE-cleaned (Dead Step Elimination) version of the LIMO dataset.

DSE removes structurally dead reasoning steps — steps that are unreachable from the conclusion via dependency DAG analysis. This produces cleaner training data that leads to more efficient reasoning.

Key Results

Benchmark Base Qwen3-4B Original LIMO SFT DSE LIMO SFT
MATH-500 56.6% 69.6% 72.8%
AIME 2025 13.3% 40.0% 43.3%
AIME 2026 36.7% 46.7% 53.3%
GPQA Diamond 43.4% 55.6% 49.0%
  • +3.2% on MATH-500 vs Original LIMO SFT
  • 28% faster training (6h26m vs 8h56m)
  • 11.5% lower training loss (0.240 vs 0.271)
  • 18.5% shorter output traces (more efficient reasoning)

Training Details

  • Base model: Qwen/Qwen3-4B
  • Training data: LIMO-DSE (817 samples, DSE-cleaned)
  • Framework: LLaMA-Factory + DeepSpeed ZeRO-2
  • Hardware: 2× NVIDIA A100 80GB (CSD3 HPC)
  • Hyperparameters:
    • Learning rate: 5e-6 (cosine schedule, 10% warmup)
    • Batch size: 8 (effective)
    • Epochs: 15
    • Max sequence length: 16384
    • Full fine-tuning (no LoRA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Ciaranshu/decor-qwen3-4b-dse")
tokenizer = AutoTokenizer.from_pretrained("Ciaranshu/decor-qwen3-4b-dse")

Citation

@article{decor2026,
  title={The Structural Reward Gap: How Outcome-Based RL Creates Superstitious Reasoning},
  author={Shu, Chang},
  year={2026}
}

Framework versions

  • Transformers 4.52.4
  • Pytorch 2.5.1+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
7
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ciaranshu/decor-qwen3-4b-dse

Finetuned
Qwen/Qwen3-4B
Finetuned
(555)
this model

Paper for Ciaranshu/decor-qwen3-4b-dse