LIMO: Less is More for Reasoning
Paper • 2502.03387 • Published • 62
This model is Qwen3-4B fine-tuned on the DSE-cleaned (Dead Step Elimination) version of the LIMO dataset.
DSE removes structurally dead reasoning steps — steps that are unreachable from the conclusion via dependency DAG analysis. This produces cleaner training data that leads to more efficient reasoning.
| Benchmark | Base Qwen3-4B | Original LIMO SFT | DSE LIMO SFT |
|---|---|---|---|
| MATH-500 | 56.6% | 69.6% | 72.8% |
| AIME 2025 | 13.3% | 40.0% | 43.3% |
| AIME 2026 | 36.7% | 46.7% | 53.3% |
| GPQA Diamond | 43.4% | 55.6% | 49.0% |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Ciaranshu/decor-qwen3-4b-dse")
tokenizer = AutoTokenizer.from_pretrained("Ciaranshu/decor-qwen3-4b-dse")
@article{decor2026,
title={The Structural Reward Gap: How Outcome-Based RL Creates Superstitious Reasoning},
author={Shu, Chang},
year={2026}
}