AI & ML interests
Model Compression
Recent Activity
View all activity
The collection of quantization models of Qwen3-VL
-
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Paper • 2509.24248 • Published • 2 -
Tequila: Trapping-free Ternary Quantization for Large Language Models
Paper • 2509.23809 • Published • 3 -
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification
Paper • 2601.07892 • Published • 4
The collection of quantization models of Qwen2 and Qwen2.5
The collection of eagle3 series models for Qwen3 and Hunyuan.
The collection of quantization models of Qwen3-VL
The collection of quantization models of Qwen3
-
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Paper • 2509.24248 • Published • 2 -
Tequila: Trapping-free Ternary Quantization for Large Language Models
Paper • 2509.23809 • Published • 3 -
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification
Paper • 2601.07892 • Published • 4
The collection of quantization models of DeepSeek and Deepseek_r1_distill
The collection of quantization models of Qwen2 and Qwen2.5