Skip to content

Conversation

@ash-sigh
Copy link
Contributor

@ash-sigh ash-sigh commented Oct 29, 2025

Motivation

support xgrammar backend for ascend npu

Modifications

support vocab mask for ascend npu

Accuracy Tests

test1:

export PYTHONPATH=/home/w30027501/sglang_code/sglang/python:$PYTHONPATH
export HCCL_OP_EXPANSION_MODE="AIV"
export OMP_PROC_BIND=false
export STREAMS_PER_DEVICE=32

python -m sglang.launch_server --model-path /data/weights/Qwen3-VL-4B-Instruct --host 127.0.0.1 --port 8022 --device npu --attention-backend ascend --tp 1 --grammar-backend xgrammar --base-gpu-id 7 --mm-attention-backend ascend_attn --trust-remote-code --enable-multimodal --cuda-graph-bs 8
# benchmark
python3 -m sglang.test.few_shot_gsm8k --num-questions 200 --port 8022

result:
Accuracy: 0.930
Invalid: 0.000
Latency: 364.722 s
Output throughput: 72.389 token/s

test2:

import openai
import os

from sglang.test.doc_patch import launch_server_cmd
from sglang.utils import wait_for_server, print_highlight, terminate_process
import json

client = openai.Client(
    base_url=f"http://127.0.0.1:8022/v1",
    api_key="None"
)

json_schema = json.dumps(
    {
        "type": "object",
        "properties": {
            "name": {"type": "string", "pattern": "^[\\w]+$"},
            "population": {"type": "integer"},
        },
        "required": ["name", "population"],
    }
)

response = client.chat.completions.create(
    model="/data/weights/Qwen3-VL-4B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Give me the information of the capital of France in the JSON format.",
        },
    ],
    temperature=0,
    max_tokens=128,
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "foo", "schema": json.loads(json_schema)},
    },
)

print_highlight(response.choices[0].message.content)

result:
{
"name": "Paris",
"population": 2161000
}

Benchmarking and Profiling

Checklist

@ping1jing2 ping1jing2 changed the title support xgrammar backend for ascend npu [Ascend]support xgrammar backend for ascend npu Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant