Qwen3-ASR-1.7B β€” GGUF (CrispASR)

GGUF conversions of Qwen/Qwen3-ASR-1.7B, the larger of the two Qwen3-ASR speech-LLMs (Whisper-style audio encoder + 1.7B Qwen3 LLM head). Runs through CrispASR's --backend qwen3 path with no architecture changes β€” the same C++ qwen3 backend that already supported the 0.6B variant loads the 1.7B directly because the converter and the runtime both read sizes from GGUF metadata rather than hardcoding them.

What's in the box

File Size Quantization Notes
qwen3-asr-1.7b-f16.gguf 4.71 GB F16 Reference precision; matches PyTorch bfloat16 within float-noise tolerance
qwen3-asr-1.7b-q8_0.gguf 2.51 GB Q8_0 Effectively lossless; same transcript on jfk.wav as F16
qwen3-asr-1.7b-q4_k.gguf 1.33 GB Q4_K 3.5Γ— compressed; minor punctuation differences on tested clips

All three contain:

  • The full audio encoder (24 layers, d_model 1024, 16 heads, 4096 ff)
  • The Qwen3 1.7B LLM (28 layers, d_model 2048, 16 heads / 8 KV heads, 6144 ff, 152K vocab, RoPE ΞΈ=1e6)
  • The full GPT-2-style BPE vocab + merges, the audio mel filterbank, and the Hann window β€” everything the C++ side needs to run inference end-to-end without re-fetching the original safetensors.

Use with CrispASR (no Python at runtime)

# Build crispasr (one-time, no Python deps for runtime β€” only the converter
# was Python).
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target whisper-cli

# Auto-download the recommended quant on first use:
./build/bin/crispasr --backend qwen3 -m auto -f my_audio.wav

# Or point at a local file:
./build/bin/crispasr --backend qwen3 \
    -m qwen3-asr-1.7b-q4_k.gguf \
    -f my_audio.wav

# Auto language detection (whisper-tiny LID pre-step):
./build/bin/crispasr --backend qwen3 \
    -m qwen3-asr-1.7b-q4_k.gguf \
    -f my_audio.wav -l auto

# Speech translation to German via the system-prompt instruction path:
./build/bin/crispasr --backend qwen3 \
    -m qwen3-asr-1.7b-q4_k.gguf \
    -f my_audio.wav --translate -tl de

# Word-level timestamps via the canary-ctc-aligner second pass:
./build/bin/crispasr --backend qwen3 \
    -m qwen3-asr-1.7b-q4_k.gguf \
    -f my_audio.wav -am canary-ctc-aligner-q5_0.gguf -osrt -ml 1

Pure C++ runtime: no Python, no PyTorch, no ONNX Runtime. The only build dep beyond a C++17 compiler and CMake is libcurl or wget for the auto-download path.

Languages

Same as upstream Qwen3-ASR: 30 languages plus 22 Chinese dialects, including English, Chinese, Cantonese, German, French, Spanish, Japanese, Korean, Russian, Arabic, Hindi, and most major European languages. See the base model card for the full matrix and accuracy figures.

How it was made

# 1. Download the base model from HF
hf download Qwen/Qwen3-ASR-1.7B --local-dir ./Qwen3-ASR-1.7B

# 2. Convert to F16 GGUF
python models/convert-qwen3-asr-to-gguf.py \
    --input ./Qwen3-ASR-1.7B \
    --output qwen3-asr-1.7b-f16.gguf

# 3. Quantize
./build/bin/crispasr-quantize qwen3-asr-1.7b-f16.gguf qwen3-asr-1.7b-q8_0.gguf q8_0
./build/bin/crispasr-quantize qwen3-asr-1.7b-f16.gguf qwen3-asr-1.7b-q4_k.gguf q4_k

The converter is shared with the 0.6B variant β€” it reads all sizes (audio encoder layers, LLM hidden size, KV head count, vocab size, head dim) from config.json rather than hardcoding them, so the same script handles both checkpoints. Total: 708 tensors (348 F16, 360 F32) per file.

Bit-identical regression check

Both quants converge on samples/jfk.wav under CrispASR's qwen3 backend:

qwen3-asr-1.7b-f16  β†’ "And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country."
qwen3-asr-1.7b-q8_0 β†’ "And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country."
qwen3-asr-1.7b-q4_k β†’ "And so, my fellow Americans, ask not what your country can do for you; ask what you can do for your country."

License

Apache-2.0, same as the upstream Qwen3-ASR-1.7B model.

Citation

@misc{qwen3asr,
    title  = {Qwen3-ASR},
    author = {Qwen Team},
    year   = {2026},
    url    = {https://huggingface.co/Qwen/Qwen3-ASR-1.7B}
}
Downloads last month
387
GGUF
Model size
2B params
Architecture
qwen3asr
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/qwen3-asr-1.7b-GGUF

Quantized
(14)
this model