Does sglang supports awq quantization for MoE model(such as deepseek-r1-awq) on CPU? #5323

spaceater · 2025-04-12T10:39:43Z

spaceater
Apr 12, 2025

I tried to run deepseek-r1-awq on CPU with vLLM, but I failed.This is because vLLM will convert awq to awq_marlin if the model is a MoE model, and marlin kernal is not supported on CPU. And I can't run the model directly in awq quantization, because awq.py doesn't handle fused MoE case, this will result in unexpected errors.
I want to ask whether sglang supports awq quantization for MoE model on CPU. I browse /python/sglang/srt/layers/quantization/awq.py and notice that it doesn't handle fused MoE case as well. Does this means awq quantization for MoE model is not supported.If so, I suggest that the support could be added:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does sglang supports awq quantization for MoE model(such as deepseek-r1-awq) on CPU? #5323

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Does sglang supports awq quantization for MoE model(such as deepseek-r1-awq) on CPU? #5323

Uh oh!

spaceater Apr 12, 2025

Replies: 0 comments

spaceater
Apr 12, 2025