How can I run sglang with disaggregation-mode ? #7492

lbh2001 · 2025-06-24T05:28:43Z

lbh2001
Jun 24, 2025

I have four H800 nodes, each with 8 GPUs, and I want to run sglang with disaggregation-mode(2P2D). I read the document docs/backend/pd_disaggregation.md :

# prefill 0
$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --disaggregation-ib-device ${device_name} --disaggregation-mode prefill --host ${local_ip} --port 30000 --trust-remote-code --dist-init-addr ${prefill_master_ip}:5000 --nnodes 2 --node-rank 0 --tp-size 16 --dp-size 8 --enable-dp-attention --enable-deepep-moe --deepep-mode normal --mem-fraction-static 0.8
# prefill 1
$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --disaggregation-ib-device ${device_name} --disaggregation-mode prefill --host ${local_ip} --port 30000 --trust-remote-code --dist-init-addr ${prefill_master_ip}:5000 --nnodes 2 --node-rank 1 --tp-size 16 --dp-size 8 --enable-dp-attention --enable-deepep-moe --deepep-mode normal --mem-fraction-static 0.8
# decode 0
$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --disaggregation-ib-device ${device_name} --disaggregation-mode decode --host ${local_ip} --port 30001 --trust-remote-code --dist-init-addr ${decode_master_ip}:5000 --nnodes 2 --node-rank 0 --tp-size 16 --dp-size 8 --enable-dp-attention --enable-deepep-moe --deepep-mode low_latency --mem-fraction-static 0.8 --max-running-requests 128
# decode 1
$ python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --disaggregation-ib-device ${device_name} --disaggregation-mode decode --host ${local_ip} --port 30001 --trust-remote-code --dist-init-addr ${decode_master_ip}:5000 --nnodes 2 --node-rank 1 --tp-size 16 --dp-size 8 --enable-dp-attention --enable-deepep-moe --deepep-mode low_latency --mem-fraction-static 0.8 --max-running-requests 128

My prefill_master_ip and decode_master_ip is different. These two master nodes warmup successfully.

Then I run mini_lb on prefill master node(prefill 0):

python3 -m sglang.srt.disaggregation.mini_lb --port 8000 --prefill http://${prefill_master_ip}:${prefill_master_port} --prefill-bootstrap-ports 8998 --decode http://${decode_master_ip}:${decode_master_port} > lb_server.log &

I'm not sure is my operation right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I run sglang with disaggregation-mode ? #7492

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How can I run sglang with disaggregation-mode ? #7492

Uh oh!

Uh oh!

lbh2001 Jun 24, 2025

Replies: 0 comments

lbh2001
Jun 24, 2025