ONNX Conversion

#10

by shuttie - opened Jun 10, 2025

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

+19

-0

shuttie

Jun 10, 2025

•

edited Jun 10, 2025

A copy of https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/discussions/18 but for the 4B model.

This is a SBERT-based ONNX conversion of the model.

Code used:

from sentence_transformers import (
    SentenceTransformer,
    export_dynamic_quantized_onnx_model,
)

model = SentenceTransformer("Qwen/Qwen3-Embedding-4B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
    export_dynamic_quantized_onnx_model(model, tpe, "export")

Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt:

sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.

add model in ONNX format (both raw and qint8/quint8 quantized)1e989783

shuttie changed pull request status to open Jun 10, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment