Does SGlang support int8 quantization for embedding models (e.g. Qwen3-embedding)? #9580
IshiKura-a
started this conversation in
General
Replies: 1 comment
-
|
Your question is good. The thing is that int8 on embeddings is not just about storage size — it changes the math space itself. You’ll find that many retrieval bugs show up because cosine distance on quantized vectors doesn’t behave the same as the original semantic space. If your issue is stability or strange ranking after int8, that’s already a known failure mode in practice (embedding mismatch). The fix isn’t just toggling support, it’s knowing when you can quantize safely and when it breaks retrieval logic. I’ve mapped out these reproducible failures. If you want the details, let me know and I’ll share the full reference. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, all!
I want to let Qwen3-embedding output int8 embeddings. I found that sentence_transformers supports this feature and wonders if sglang could too.
Beta Was this translation helpful? Give feedback.
All reactions