Br a5 dev 1110 #4114

w630497878 · 2025-11-11T02:23:57Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-11T02:24:06Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for a new hardware target, designated as 'A5', across various components of the vLLM Ascend backend. While the changes are extensive, there are numerous critical issues, including typos, undefined variables, incorrect API usage, and logical errors that will prevent the code from running. These issues are present in almost every modified file and need to be addressed to ensure correctness and functionality on the new hardware path. My review provides specific comments and suggestions to fix these critical problems.

gemini-code-assist · 2025-11-11T02:26:31Z

vllm_ascend/distributed/llmdatadist_c_mgr_connector.py

+            and device_filter(d.get("device_id", ""))
+        ]
+        if len(device_list) <= self.pcp_rank * self.tp_size + self.tp_rank:
+            retunr None


There is a typo in the return statement. retunr should be return.

Suggested change

retunr None

return None

gemini-code-assist · 2025-11-11T02:26:31Z

vllm_ascend/distributed/llmdatadist_c_mgr_connector.py

+                agent_metadata = LLMDataDistCMgrAgentMetadataA5(
+                    server_id=server_id_,
+                    device_id=device_id_,
+                    device_ip=device_ip_,


The variable device_ip_ is not defined in this scope, which will cause a NameError. It is defined in the else block but not in the if is_A5() block. You need to extract device_ip from device_info.

Suggested change

device_ip=device_ip_,

device_ip=device_info["device_ip"],

vllm_ascend/attention/attention_v1.py

gemini-code-assist · 2025-11-11T02:26:32Z

vllm_ascend/attention/attention_v1.py

+                if is_A5(): 
+                    batch_size = attn_metadata.query_lens.shape[0]
+                    hidden_szie = self.num_heads * self.head_size
+                    query = query[:batch_szie]
+                    query = query.view(batch_size, 1, hidden_size)
+                    block_size = self.key_cache.shape[1]
+                    key = self.key_cache.flatten(2, 3).contiguous()
+                    ori_output = output
+                    output, _ = torch_nup.npu_fused_infer_attention_score_v2(
+                        query=query,
+                        key=key,
+                        value=value,
+                        actual_seq_kvlen=attn_metadata.seq_len,
+                        num_query_heads=self.num_heads,
+                        num_key_value_heads=self.num_kv_heads,
+                        block_table=attn_metadata.block_tables[:batch_szie],
+                        block_size=block_size,
+                        softmax_scale=self.scale,
+                        inpt_layout="BSH"
+                    )


This code block contains several critical errors that will cause it to fail at runtime:

Line 714: Typo hidden_szie should be hidden_size.

Line 715 & 727: Typo batch_szie should be batch_size.

Line 716: hidden_size is used but was defined with a typo as hidden_szie.

Line 720: Typo torch_nup should be torch_npu.

Line 723: The value variable is not defined in this scope. It should likely be derived from self.value_cache similar to how key is derived from self.key_cache.

Line 724: attn_metadata does not have an attribute seq_len. It should probably be seq_lens.

Line 730: Typo inpt_layout should be input_layout.

if is_A5(): batch_size = attn_metadata.query_lens.shape[0] hidden_size = self.num_heads * self.head_size query = query[:batch_size] query = query.view(batch_size, 1, hidden_size) block_size = self.key_cache.shape[1] key = self.key_cache.flatten(2, 3).contiguous() value = self.value_cache.flatten(2, 3).contiguous() ori_output = output output, _ = torch_npu.npu_fused_infer_attention_score_v2( query=query, key=key, value=value, actual_seq_kvlen=attn_metadata.seq_lens, num_query_heads=self.num_heads, num_key_value_heads=self.num_kv_heads, block_table=attn_metadata.block_tables[:batch_size], block_size=block_size, softmax_scale=self.scale, input_layout="BSH" )

gemini-code-assist · 2025-11-11T02:26:32Z

vllm_ascend/attention/attention_v1.py

+        if is_A5():
+            output, _ = torch_npu.npu_fused_infer_attention_score_v2(
+                query=query,
+                key=self.key_cache.flatten(2,3).contiguous(),
+                value=self.value_cache.flatten(2,3).contiguous(),
+                atten_mask=attn_metadata.attn_mask,
+                actual_seq_qlen=attn_metadata.actual_seq_lengths_q,
+                actual_seq_kvlen=attn_metadata.seq_lens_list,
+                num_query_heads=self.num_heads,
+                num_key_value_heads=self.num_kv_heads,
+                block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]],
+                 block_size=self.key_cache.shape[1],
+                softmax_scale=self.scale,
+                imput_layout="TND"
+            )


There are a couple of issues in this block:

On line 805, there is an indentation error for block_size, which will cause a SyntaxError.

On line 807, imput_layout is a typo and should be input_layout.

if is_A5(): output, _ = torch_npu.npu_fused_infer_attention_score_v2( query=query, key=self.key_cache.flatten(2,3).contiguous(), value=self.value_cache.flatten(2,3).contiguous(), atten_mask=attn_metadata.attn_mask, actual_seq_qlen=attn_metadata.actual_seq_lengths_q, actual_seq_kvlen=attn_metadata.seq_lens_list, num_query_heads=self.num_heads, num_key_value_heads=self.num_kv_heads, block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]], block_size=self.key_cache.shape[1], softmax_scale=self.scale, input_layout="TND" )

gemini-code-assist · 2025-11-11T02:26:33Z

vllm_ascend/ops/rotary_embedding.py

+        if is_A5(): # A5不支持npu_mrope算子，这里需要使用小算子替换
+            return


There are two critical issues here:

The is_A5 function is not imported, which will lead to a NameError.

The function forward_oot is expected to return a tuple (query, key). However, this if block has an early return without a value, which implicitly returns None. This will cause a TypeError when the caller tries to unpack the result. The comment indicates this is a placeholder, but it should at least return the original query and key to avoid crashing.

Suggested change

if is_A5(): # A5不支持npu_mrope算子，这里需要使用小算子替换

return

if is_A5(): # A5不支持npu_mrope算子，这里需要使用小算子替换

return query, key

gemini-code-assist · 2025-11-11T02:26:33Z

vllm_ascend/sample/sampler.py

    ) -> torch.Tensor:
        # npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P
-        if not is_310p() and p is not None and k is not None and 1 <= int(
+        if not is_310p() and not is_A5() and p is not None and k is not None and 1 <= int(


The function is_A5 is used here but it is not imported in this file. This will cause a NameError at runtime.

gemini-code-assist · 2025-11-11T02:26:33Z

vllm_ascend/worker/model_runner_v1.py

+            if is_A5():
+                mas_seq_len = max(seq_lens, default=0)
+                max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size
+                new_element = torch.tensor([max_seq_len])
+                seq_lens = torch.cat([seq_lens, new_element], dim =0)
+                return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)


This block has several critical issues:

The is_A5 function is not imported, which will cause a NameError.

On line 997, there is a typo mas_seq_len which should be max_seq_len.

On line 998, there is a typo self.block_szie which should be self.block_size.

On line 1000, seq_lens is reassigned but not used afterwards in this block, making the operation ineffective.

Suggested change

if is_A5():

mas_seq_len = max(seq_lens, default=0)

max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size

new_element = torch.tensor([max_seq_len])

seq_lens = torch.cat([seq_lens, new_element], dim =0)

return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)

if is_A5():

max_seq_len = max(seq_lens.tolist(), default=0)

max_seq_len = (max_seq_len + self.block_size - 1) // self.block_size * self.block_size

return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)

gemini-code-assist · 2025-11-11T02:26:33Z

vllm_ascend/worker/model_runner_v1.py

                             if num_tokens <= self.mc2_tokens_capacity else
                             MoECommType.ALLTOALL)
+        elif soc_version in {AscendSocVersion.A5}:
+            if (num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_cross_dp >= 8 and is_mc2_models(model_type)) :


The variable model_type is used here but it is not defined in the scope of the _select_moe_comm_method function. It needs to be passed as an argument to the function.

vllm_ascend/attention/attention_v1.py

github-actions · 2025-11-11T02:26:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

linfeng-yuan · 2025-11-13T09:31:23Z

vllm_ascend/attention/attention_v1.py

+                num_query_heads=self.num_heads,
+                num_key_value_heads=self.num_kv_heads,
+                softmax_scale=self.scale,
+                spare_mode=2, #spare_mode=2时，代表leftupCausal模式的mask


Please remove chinese characters and same below.

github-actions · 2025-11-14T00:47:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…scend into br_a5_dev_1110

github-actions bot added module:ops module:core labels Nov 11, 2025

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Nov 11, 2025

w630497878 force-pushed the br_a5_dev_1110 branch from 092c445 to 739002f Compare November 11, 2025 11:17

github-actions bot removed the merge-conflicts label Nov 11, 2025

w630497878 force-pushed the br_a5_dev_1110 branch 6 times, most recently from abc05bb to 9d054ef Compare November 13, 2025 08:57

linfeng-yuan reviewed Nov 13, 2025

View reviewed changes

w630497878 force-pushed the br_a5_dev_1110 branch from 9d054ef to 76c6cce Compare November 13, 2025 12:08

github-actions bot added the merge-conflicts label Nov 14, 2025

w630497878 added 5 commits November 14, 2025 09:08

代码同步

fc17405

fix ring mla

6296dc6

Qwen3-0.6B基础功能恢复

d0104cd

:fix

fbaa4f3

deepseek v3 基础模型裁剪恢复

38beae1

w630497878 force-pushed the br_a5_dev_1110 branch from 76c6cce to 38beae1 Compare November 14, 2025 02:23

github-actions bot removed the merge-conflicts label Nov 14, 2025

tmp

a1ea5ea

github-actions bot added the module:quantization label Nov 14, 2025

w630497878 added 3 commits November 15, 2025 16:46

量化同步tmp

def4ea8

Merge branch 'br_a5_dev_1110' of https://github.com/w630497878/vllm-a…

1d7bd6f

…scend into br_a5_dev_1110

fix

46220c0

		if is_A5(): # A5不支持npu_mrope算子，这里需要使用小算子替换
		return

Br a5 dev 1110 #4114

Are you sure you want to change the base?

Br a5 dev 1110 #4114

Conversation

w630497878 commented Nov 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

linfeng-yuan Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

w630497878 commented Nov 11, 2025 •

edited by github-actions bot

Loading