Does SGlang support int8 quantization for embedding models (e.g. Qwen3-embedding)? #9580

IshiKura-a · 2025-08-25T06:55:54Z

IshiKura-a
Aug 25, 2025

Hi, all!

I want to let Qwen3-embedding output int8 embeddings. I found that sentence_transformers supports this feature and wonders if sglang could too.

onestardao · 2025-08-31T12:24:11Z

onestardao
Aug 31, 2025

Your question is good. The thing is that int8 on embeddings is not just about storage size — it changes the math space itself. You’ll find that many retrieval bugs show up because cosine distance on quantized vectors doesn’t behave the same as the original semantic space.

If your issue is stability or strange ranking after int8, that’s already a known failure mode in practice (embedding mismatch). The fix isn’t just toggling support, it’s knowing when you can quantize safely and when it breaks retrieval logic.

I’ve mapped out these reproducible failures. If you want the details, let me know and I’ll share the full reference.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does SGlang support int8 quantization for embedding models (e.g. Qwen3-embedding)? #9580

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does SGlang support int8 quantization for embedding models (e.g. Qwen3-embedding)? #9580

Uh oh!

IshiKura-a Aug 25, 2025

Replies: 1 comment

Uh oh!

onestardao Aug 31, 2025

IshiKura-a
Aug 25, 2025

onestardao
Aug 31, 2025