Skip to content

Conversation

@w630497878
Copy link

@w630497878 w630497878 commented Nov 11, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for a new hardware target, designated as 'A5', across various components of the vLLM Ascend backend. While the changes are extensive, there are numerous critical issues, including typos, undefined variables, incorrect API usage, and logical errors that will prevent the code from running. These issues are present in almost every modified file and need to be addressed to ensure correctness and functionality on the new hardware path. My review provides specific comments and suggestions to fix these critical problems.

and device_filter(d.get("device_id", ""))
]
if len(device_list) <= self.pcp_rank * self.tp_size + self.tp_rank:
retunr None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo in the return statement. retunr should be return.

Suggested change
retunr None
return None

agent_metadata = LLMDataDistCMgrAgentMetadataA5(
server_id=server_id_,
device_id=device_id_,
device_ip=device_ip_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable device_ip_ is not defined in this scope, which will cause a NameError. It is defined in the else block but not in the if is_A5() block. You need to extract device_ip from device_info.

Suggested change
device_ip=device_ip_,
device_ip=device_info["device_ip"],

Comment on lines 712 to 811
if is_A5():
batch_size = attn_metadata.query_lens.shape[0]
hidden_szie = self.num_heads * self.head_size
query = query[:batch_szie]
query = query.view(batch_size, 1, hidden_size)
block_size = self.key_cache.shape[1]
key = self.key_cache.flatten(2, 3).contiguous()
ori_output = output
output, _ = torch_nup.npu_fused_infer_attention_score_v2(
query=query,
key=key,
value=value,
actual_seq_kvlen=attn_metadata.seq_len,
num_query_heads=self.num_heads,
num_key_value_heads=self.num_kv_heads,
block_table=attn_metadata.block_tables[:batch_szie],
block_size=block_size,
softmax_scale=self.scale,
inpt_layout="BSH"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This code block contains several critical errors that will cause it to fail at runtime:

  • Line 714: Typo hidden_szie should be hidden_size.
  • Line 715 & 727: Typo batch_szie should be batch_size.
  • Line 716: hidden_size is used but was defined with a typo as hidden_szie.
  • Line 720: Typo torch_nup should be torch_npu.
  • Line 723: The value variable is not defined in this scope. It should likely be derived from self.value_cache similar to how key is derived from self.key_cache.
  • Line 724: attn_metadata does not have an attribute seq_len. It should probably be seq_lens.
  • Line 730: Typo inpt_layout should be input_layout.
                if is_A5(): 
                    batch_size = attn_metadata.query_lens.shape[0]
                    hidden_size = self.num_heads * self.head_size
                    query = query[:batch_size]
                    query = query.view(batch_size, 1, hidden_size)
                    block_size = self.key_cache.shape[1]
                    key = self.key_cache.flatten(2, 3).contiguous()
                    value = self.value_cache.flatten(2, 3).contiguous()
                    ori_output = output
                    output, _ = torch_npu.npu_fused_infer_attention_score_v2(
                        query=query,
                        key=key,
                        value=value,
                        actual_seq_kvlen=attn_metadata.seq_lens,
                        num_query_heads=self.num_heads,
                        num_key_value_heads=self.num_kv_heads,
                        block_table=attn_metadata.block_tables[:batch_size],
                        block_size=block_size,
                        softmax_scale=self.scale,
                        input_layout="BSH"
                    )

Comment on lines 794 to 888
if is_A5():
output, _ = torch_npu.npu_fused_infer_attention_score_v2(
query=query,
key=self.key_cache.flatten(2,3).contiguous(),
value=self.value_cache.flatten(2,3).contiguous(),
atten_mask=attn_metadata.attn_mask,
actual_seq_qlen=attn_metadata.actual_seq_lengths_q,
actual_seq_kvlen=attn_metadata.seq_lens_list,
num_query_heads=self.num_heads,
num_key_value_heads=self.num_kv_heads,
block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]],
block_size=self.key_cache.shape[1],
softmax_scale=self.scale,
imput_layout="TND"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a couple of issues in this block:

  1. On line 805, there is an indentation error for block_size, which will cause a SyntaxError.
  2. On line 807, imput_layout is a typo and should be input_layout.
        if is_A5():
            output, _ = torch_npu.npu_fused_infer_attention_score_v2(
                query=query,
                key=self.key_cache.flatten(2,3).contiguous(),
                value=self.value_cache.flatten(2,3).contiguous(),
                atten_mask=attn_metadata.attn_mask,
                actual_seq_qlen=attn_metadata.actual_seq_lengths_q,
                actual_seq_kvlen=attn_metadata.seq_lens_list,
                num_query_heads=self.num_heads,
                num_key_value_heads=self.num_kv_heads,
                block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]],
                block_size=self.key_cache.shape[1],
                softmax_scale=self.scale,
                input_layout="TND"
            )

Comment on lines 423 to 424
if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are two critical issues here:

  1. The is_A5 function is not imported, which will lead to a NameError.
  2. The function forward_oot is expected to return a tuple (query, key). However, this if block has an early return without a value, which implicitly returns None. This will cause a TypeError when the caller tries to unpack the result. The comment indicates this is a placeholder, but it should at least return the original query and key to avoid crashing.
Suggested change
if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换
return
if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换
return query, key

) -> torch.Tensor:
# npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P
if not is_310p() and p is not None and k is not None and 1 <= int(
if not is_310p() and not is_A5() and p is not None and k is not None and 1 <= int(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function is_A5 is used here but it is not imported in this file. This will cause a NameError at runtime.

Comment on lines 996 to 1027
if is_A5():
mas_seq_len = max(seq_lens, default=0)
max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size
new_element = torch.tensor([max_seq_len])
seq_lens = torch.cat([seq_lens, new_element], dim =0)
return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block has several critical issues:

  1. The is_A5 function is not imported, which will cause a NameError.
  2. On line 997, there is a typo mas_seq_len which should be max_seq_len.
  3. On line 998, there is a typo self.block_szie which should be self.block_size.
  4. On line 1000, seq_lens is reassigned but not used afterwards in this block, making the operation ineffective.
Suggested change
if is_A5():
mas_seq_len = max(seq_lens, default=0)
max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size
new_element = torch.tensor([max_seq_len])
seq_lens = torch.cat([seq_lens, new_element], dim =0)
return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)
if is_A5():
max_seq_len = max(seq_lens.tolist(), default=0)
max_seq_len = (max_seq_len + self.block_size - 1) // self.block_size * self.block_size
return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool)

if num_tokens <= self.mc2_tokens_capacity else
MoECommType.ALLTOALL)
elif soc_version in {AscendSocVersion.A5}:
if (num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_cross_dp >= 8 and is_mc2_models(model_type)) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable model_type is used here but it is not defined in the scope of the _select_moe_comm_method function. It needs to be passed as an argument to the function.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@w630497878 w630497878 force-pushed the br_a5_dev_1110 branch 6 times, most recently from abc05bb to 9d054ef Compare November 13, 2025 08:57
num_query_heads=self.num_heads,
num_key_value_heads=self.num_kv_heads,
softmax_scale=self.scale,
spare_mode=2, #spare_mode=2时,代表leftupCausal模式的mask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove chinese characters and same below.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants