Jackrong commited on
Commit
99c8ff6
·
verified ·
1 Parent(s): f815da9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -11
README.md CHANGED
@@ -24,31 +24,34 @@ new_version: Jackrong/Qwen3.5-4B-Neo
24
 
25
  # 🌟 Qwen3.5-4B-Neo
26
 
27
- ## 💡 Model Introduction
28
- **Jackrong/Qwen3.5-4B-Neo** is a reasoning-focused fine-tune of the Qwen3.5-4B model. Its primary objective is to drastically improve the *efficiency* of chain-of-thought generation, unlocking substantial gains in reasoning speed and token-cost-reduction, while actually increasing absolute accuracy.
29
 
30
- The goal of this Neo model is not simply to make the model "think more," but to help it **think more economically**: compressing unnecessarily long internal chains, avoiding verbose over-analysis, and massively improving the reasoning-cost-to-quality ratio. Based on the MMLU-Pro benchmark testing, the model achieves a significant **+11.43% pass@1 improvement** over the baseline, with the most pronounced improvements occurring in domains that demand rigorous multi-step quantitative reasoning like Physics, Mathematics, and Computer Science. All of this is accomplished with an astounding **+123.8% improvement in reasoning efficiency**, slashing the median think-chain length by 57.6%.
31
-
32
- ### MMLU-Pro Benchmark Analysis 🪐
33
 
 
34
 
35
- ![Screenshot 2026-03-22 at 5.34.27 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/hzzA7YU4WsIp2706rpxmR.png)
36
 
 
37
 
38
- ![Screenshot 2026-03-22 at 5.31.32 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/R7OWCiUxYHGP-oXvEU5u7.png)
39
 
40
 
41
- ![Screenshot 2026-03-22 at 5.32.03 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/kjwJaG-qOfFvmd_AdXzmD.png)
42
 
 
43
 
44
- ![Screenshot 2026-03-22 at 5.32.21 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/8vxz20C5dp87I3aw38K5l.png)
45
 
 
46
 
47
- ![Screenshot 2026-03-22 at 5.32.38 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/8JOvpRWEaHh2GUSWkV3US.png)
48
 
 
49
 
50
- ![Screenshot 2026-03-22 at 5.32.50 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/62bxgjv0CaWo4AJBL5w4L.png)
51
 
 
52
 
53
  ## 🗺️ Training Pipeline Overview
54
 
 
24
 
25
  # 🌟 Qwen3.5-4B-Neo
26
 
27
+ ## Model Introduction
28
+ **Qwen3.5-4B-Neo** is a reasoning-focused fine-tune of Qwen3.5-4B. It is designed to make the model’s reasoning process more concise and efficient, while keeping overall accuracy competitive.
29
 
30
+ > On a 250-question MMLU-Pro subset covering five categories, Qwen3.5-4B-Neo achieved **82.00% pass@1 (205/250)**, compared with **80.40% (201/250)** for the base Qwen3.5-4B. The gain is modest, but Neo also shows a much shorter reasoning process overall.
 
 
31
 
32
+ > On non-truncated outputs, the average think-chain length was reduced from **6,962 to 3,955 characters**, and the median length dropped from **4,600 to 1,951 characters**. In efficiency terms, this corresponds to **2.31 correct solutions per 10k think characters**, compared with 1.03 for the base model.
33
 
34
+ Across the five categories, Neo performed better in biology, computer science, mathematics, and other sciences, while trailing in physics. Overall, the results suggest that Qwen3.5-4B-Neo offers slightly better accuracy than the base model, with a substantially more efficient reasoning style.
35
 
36
+ > ⚠️ Note: The evaluation results shown here are based on a sampled subset of MMLU-Pro rather than the full benchmark. While the subset was kept balanced across five categories, the reported numbers are intended mainly for relative comparison under this specific setting and may not fully represent the model’s performance on the complete benchmark.
37
 
 
38
 
39
 
40
+ ### MMLU-Pro Benchmark Analysis 🪐
41
 
42
+ ![Screenshot 2026-03-23 at 6.06.07 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/TlcPXA7tRb2cC_qC2_so_.png)
43
 
44
+ ![Screenshot 2026-03-23 at 6.07.39 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/KG76kNeIJ88F4iYhOxcHI.png)
45
 
46
+ ![Screenshot 2026-03-23 at 6.07.59 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/JeN8ulnILxwqXDb6TgW6w.png)
47
 
48
+ ![Screenshot 2026-03-23 at 6.08.18 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/6acL4CyZzLuNX0cOB7jil.png)
49
 
50
+ ![Screenshot 2026-03-23 at 6.08.38 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/mHEjeqz8jdm76b1gFfosV.png)
51
 
52
+ ![Screenshot 2026-03-23 at 6.08.53 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/rUwyqwr24ZCRKeFqDcjUq.png)
53
 
54
+ ![Screenshot 2026-03-23 at 6.09.10 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/yDmQ-FncNkxlHu0D0XbCa.png)
55
 
56
  ## 🗺️ Training Pipeline Overview
57