---
quantized_by: moxin-org
base_model:
- deepseek-ai/DeepSeek-R1-0528
base_model_relation: quantized
license: mit
tags:
- deepseek_r1
- deepseek
- transformers
- GGUF
pipeline_tag: text-generation
---
## llama.cpp Mixed Precision Quant of DeepSeek-R1-0528
All quants made based on [moxin-org/CC-MoE](https://github.com/moxin-org/CC-MoE).
We hold higher expectations for the reasoning modelsโ performance; therefore, we have currently opted not to compress them into smaller sizes as we did for the V3 versions.
```
- Q2_K_L : 220.55 GiB (2.82 BPW)
- IQ2_XXS : 186.23 GiB (2.38 BPW)
```
## Benchmark Comparison
| Benchmark (Metric) | Qwen3
llamacpp Q8 (233G) | R1
Ours (220G) |
|:------------------:|:---------------------------:|:-----------------:|
| **Activated Params:**
**Total Params:** | 22B
235B | 37B
671B |
| **aime24** | 86.67 | **86.67** |
| **gpqa_diamond_cot_n_shot** | 68.18 | **74.24** |
| **gsm8k** | 84.99 | **96.51** |
> **Note:** Both models use MoE architecture.
> **Bold** values mark the best performance per benchmark.
### Download
Download available for huggingface_hub, huggingface-cli, snapshot_download, xet
๐ Download Guide
```bash
# !pip install huggingface_hub hf_transfer
import os
# os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "moxin-org/DeepSeek-R1-0528-Moxin-GGUF",
local_dir = "DeepSeek-R1-0528-Moxin-GGUF",
allow_patterns = ["*Q2_K_L*"], # IQ2_XXS
)
```
### Usage
Example of runing gguf with local build of llama.cpp. (llama-cli/llama-server)
๐ Build llama.cpp locally
```
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
cmake --build build --config Release -j --clean-first
```
```
build/bin/llama-cli -m DeepSeek-R1-0528-Moxin-GGUF/R1-Q2_K_L/DeepSeek-R1-0528-Moxin-Q2_K_L-00001-of-00007.gguf \
-ngl 99 \
--temp 0.6 \
--top-p 0.95 \
--min-p 0.01 \
--ctx-size 16384
```
---
### Citation
If this work is helpful, please kindly cite as:
```bibtex
@article{chen2025collaborative,
title={Collaborative Compression for Large-Scale MoE Deployment on Edge},
author={Chen, Yixiao and Xie, Yanyue and Yang, Ruining and Jiang, Wei and Wang, Wei and He, Yong and Chen, Yue and Zhao, Pu and Wang, Yanzhi},
journal={arXiv preprint arXiv:2509.25689},
year={2025}
}
```
## Acknowledgements
This repository builds upon the outstanding work of the following open-source authors and projects:
- [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1).
- [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp), [unsloth.ai](https://unsloth.ai/), [bartowski](https://github.com/bartowski1182).
- [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp), [ikawrakow](https://github.com/ikawrakow), [ubergarm](https://github.com/ubergarm).
- [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
We sincerely thank them for their excellent contributions to the open-source community.