Qwen3-TTS


  ๐Ÿค— Hugging Face   |   ๐Ÿค– ModelScope   |   ๐Ÿ“‘ Blog   |   ๐Ÿ“‘ Paper   |   ๐Ÿ’ป GitHub

We release Qwen3-TTS, a series of powerful speech generation models developed by Qwen, offering comprehensive support for voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control.

Overview

Qwen3-TTS covers 10 major languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian) as well as multiple dialectal voice profiles. Key features:

  • Powerful Speech Representation: Powered by the self-developed Qwen3-TTS-Tokenizer-12Hz, it achieves efficient acoustic compression and high-dimensional semantic modeling.
  • Universal End-to-End Architecture: Utilizing a discrete multi-codebook LM architecture to bypass traditional information bottlenecks.
  • Extreme Low-Latency Streaming Generation: Supports streaming generation with end-to-end synthesis latency as low as 97ms.
  • Intelligent Voice Control: Supports speech generation driven by natural language instructions for flexible control over timbre, emotion, and prosody.

Quickstart

Environment Setup

Install the qwen-tts Python package from PyPI:

pip install -U qwen-tts

Python Package Usage

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

# Load the model
model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

# Custom Voice Generation
wavs, sr = model.generate_custom_voice(
    text="ๅ…ถๅฎžๆˆ‘็œŸ็š„ๆœ‰ๅ‘็Žฐ๏ผŒๆˆ‘ๆ˜ฏไธ€ไธช็‰นๅˆซๅ–„ไบŽ่ง‚ๅฏŸๅˆซไบบๆƒ…็ปช็š„ไบบใ€‚",
    language="Chinese",
    speaker="Vivian",
    instruct="็”จ็‰นๅˆซๆ„คๆ€’็š„่ฏญๆฐ”่ฏด",
)
sf.write("output.wav", wavs[0], sr)

Evaluation

Zero-shot speech generation on the Seed-TTS test set (Word Error Rate (WER, โ†“)):

Model test-zh test-en
Qwen3-TTS-12Hz-1.7B-Base 0.77 1.24

Citation

If you find our paper and code useful in your research, please consider giving a star โญ and citation ๐Ÿ“:

@article{Qwen3-TTS,
  title={Qwen3-TTS Technical Report},
  author={Hangrui Hu and Xinfa Zhu and Ting He and Dake Guo and Bin Zhang and Xiong Wang and Zhifang Guo and Ziyue Jiang and Hongkun Hao and Zishan Guo and Xinyu Zhang and Pei Zhang and Baosong Yang and Jin Xu and Jingren Zhou and Junyang Lin},
  journal={arXiv preprint arXiv:2601.15621},
  year={2026}
}
Downloads last month
153,292
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Spaces using Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign 75

Collection including Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

Paper for Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign