fix: dummy health check server not accessible on non-zero rank nodes #12297

ishandhanani · 2025-10-28T22:20:35Z

Problem

When running multi-node inference with --nnodes > 1 and --node-rank >= 1, the dummy health check server logs that it's started but is not actually accessible:

# Launch command
python3 -m sglang.launch_server \
  --model-path /model/ \
  --host 0.0.0.0 \
  --nnodes 2 \
  --node-rank 1 \
  --enable-metrics \
  --dist-init-addr 10.30.1.187:29500 \
  --tp 8

# Log output
2025-10-28T21:32:44.763434Z  INFO common.launch_dummy_health_check_server: Dummy health check server scheduled on existing loop at 0.0.0.0:30000

# But curl fails
curl http://localhost:30000/health
# Connection refused or timeout

Inference works correctly across nodes, but the health check and metrics endpoints are unreachable on non-zero rank nodes.

Root Cause

The issue occurs in launch_dummy_health_check_server() when called from an async context (e.g., custom distributed runtime wrappers or when an event loop policy is set):

asyncio.get_running_loop() succeeds and finds the parent event loop
loop.create_task(server.serve()) schedules the server as a task
The function returns immediately
The main thread then blocks on proc.join() waiting for scheduler processes
The scheduled task never executes because the loop that owns it is now blocked

# Old buggy code
try:
    loop = asyncio.get_running_loop()
    loop.create_task(server.serve())  # ← Scheduled but never runs
except RuntimeError:
    server.run()

Solution

Run the health check server in a dedicated daemon thread with its own event loop:

def run_server():
    asyncio.run(server.serve())

thread = threading.Thread(target=run_server, daemon=True, name="health-check-server")
thread.start()

Benefits:

Works in both async and sync contexts
Doesn't block the main thread
Daemon thread ensures automatic cleanup on process exit
Creates isolated event loop independent of parent context

Testing

Multi-node setup:

# Node 0
python3 -m sglang.launch_server --model-path /model/ --nnodes 2 --node-rank 0 --enable-metrics --tp 8

# Node 1  
python3 -m sglang.launch_server --model-path /model/ --nnodes 2 --node-rank 1 --enable-metrics --tp 8

# Verify health checks work on both nodes
curl http://node0:30000/health  # ✓ Works
curl http://node1:30000/health  # ✓ Now works (previously failed)
curl http://node1:30000/metrics # ✓ Metrics accessible

Custom async runtime:

async def init():
    engine = sgl.Engine(...)  # Called from async context
    # Health check server now accessible even when Engine() is called in async context

Files Changed

python/sglang/srt/utils/common.py: Fixed launch_dummy_health_check_server() to use background thread

…ing main thread

gemini-code-assist · 2025-10-28T22:20:47Z

Summary of Changes

Hello @ishandhanani, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug where the dummy health check server failed to become accessible on non-zero rank nodes in multi-node inference setups. By refactoring the server's launch mechanism to utilize a separate daemon thread with its own event loop, the health check and metrics endpoints are now reliably available across all nodes, improving the robustness and observability of distributed SGLang deployments.

Highlights

Health Check Accessibility Fix: Resolved an issue where the dummy health check server was inaccessible on non-zero rank nodes during multi-node inference, preventing health checks and metrics from functioning correctly.
Root Cause Identification: Identified the root cause as the health check server task being scheduled on the main event loop, which then blocked, preventing the server from ever starting.
Concurrency Model Change: Implemented a solution to run the health check server in a dedicated daemon thread with its own asyncio event loop, ensuring it operates independently and doesn't block the main process.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves the issue of the dummy health check server being inaccessible on non-zero rank nodes. The root cause analysis is spot-on, and the solution of running the server in a dedicated daemon thread with its own event loop is robust and well-implemented. The change is clear and correct. I have one suggestion to further improve the robustness by adding exception handling in the new thread.

gemini-code-assist · 2025-10-28T22:22:06Z

python/sglang/srt/utils/common.py

+    def run_server():
+        asyncio.run(server.serve())


The new implementation correctly runs the server in a background thread. However, if an exception occurs within server.serve() (e.g., the port is already in use), it will be raised in the background thread and terminate it silently. This could make debugging difficult as the main process would be unaware of the failure.

It's good practice to wrap the call in a try...except block to log any exceptions that occur in the thread. This will provide visibility into failures of the health check server.

Suggested change

def run_server():

asyncio.run(server.serve())

def run_server():

try:

asyncio.run(server.serve())

except Exception:

logger.exception("Dummy health check server thread failed unexpectedly.")

fix(srt): run health check server in background thread to avoid block…

3cee510

…ing main thread

sglang-bot added the run-ci label Oct 28, 2025

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

ishandhanani added 5 commits October 28, 2025 15:32

bump

ad2b0d2

bump

e15ee19

Merge branch 'main' into ishan/fix-server

23a5a0a

bump

7ec0d40

bump

c83df26

ishandhanani requested review from Ying1123, hnyls2002, merrymercy and zhyncs as code owners October 30, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: dummy health check server not accessible on non-zero rank nodes #12297

fix: dummy health check server not accessible on non-zero rank nodes #12297

ishandhanani commented Oct 28, 2025

Uh oh!

gemini-code-assist bot commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: dummy health check server not accessible on non-zero rank nodes #12297

Are you sure you want to change the base?

fix: dummy health check server not accessible on non-zero rank nodes #12297

Conversation

ishandhanani commented Oct 28, 2025

Problem

Root Cause

Solution

Testing

Files Changed

Uh oh!

gemini-code-assist bot commented Oct 28, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants