Skip to content

port in annotation is not picked by gateway pod #1684

@Jeffwan

Description

@Jeffwan

🚀 Feature Description and Motivation

              labels:
                model.aibrix.ai/name: qwen3-8B
                model.aibrix.ai/port: "30000"
                model.aibrix.ai/engine: sglang
            spec:
              nodeSelector:
                kubernetes.io/hostname: 192.168.0.6
              containers:
                - name: decode
                  image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1
                  command: ["sh", "-c"]
                  args:
                    - |
                      python3 -m sglang.launch_server \
                        --model-path /models/Qwen3-8B \
                        --served-model-name qwen3-8B \
                        --host 0.0.0.0 \
                        --port 30000 \
                        --disaggregation-mode decode \
                        --disaggregation-transfer-backend=mooncake \
                        --trust-remote-code \
                        --mem-fraction-static 0.8 \
                        --log-level debug
curl -v http://${ENDPOINT}/v1/chat/completions \
-H "routing-strategy: pd" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-8B",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "help me write a random generator in python"}
    ],
    "temperature": 0.7
}'
* Host localhost:8888 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8888...
* Connected to localhost (::1) port 8888
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8888
> User-Agent: curl/8.7.1
> Accept: */*
> routing-strategy: pd
> Content-Type: application/json
> Content-Length: 232
>
* upload completely sent off: 232 bytes
< HTTP/1.1 400 Bad Request
< x-error-no-model-backends: qwen3-8Bxxx
< content-type:
< content-length: 67
< date: Mon, 20 Oct 2025 05:38:27 GMT
< connection: close
<
* Closing connection
{"error":{"code":400,"message":"model qwen3-8B does not exist"}}%

Use Case

routing

Proposed Solution

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions