-
Notifications
You must be signed in to change notification settings - Fork 563
Br a5 dev 1110 #4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Br a5 dev 1110 #4114
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for a new hardware target, designated as 'A5', across various components of the vLLM Ascend backend. While the changes are extensive, there are numerous critical issues, including typos, undefined variables, incorrect API usage, and logical errors that will prevent the code from running. These issues are present in almost every modified file and need to be addressed to ensure correctness and functionality on the new hardware path. My review provides specific comments and suggestions to fix these critical problems.
| and device_filter(d.get("device_id", "")) | ||
| ] | ||
| if len(device_list) <= self.pcp_rank * self.tp_size + self.tp_rank: | ||
| retunr None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| agent_metadata = LLMDataDistCMgrAgentMetadataA5( | ||
| server_id=server_id_, | ||
| device_id=device_id_, | ||
| device_ip=device_ip_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if is_A5(): | ||
| batch_size = attn_metadata.query_lens.shape[0] | ||
| hidden_szie = self.num_heads * self.head_size | ||
| query = query[:batch_szie] | ||
| query = query.view(batch_size, 1, hidden_size) | ||
| block_size = self.key_cache.shape[1] | ||
| key = self.key_cache.flatten(2, 3).contiguous() | ||
| ori_output = output | ||
| output, _ = torch_nup.npu_fused_infer_attention_score_v2( | ||
| query=query, | ||
| key=key, | ||
| value=value, | ||
| actual_seq_kvlen=attn_metadata.seq_len, | ||
| num_query_heads=self.num_heads, | ||
| num_key_value_heads=self.num_kv_heads, | ||
| block_table=attn_metadata.block_tables[:batch_szie], | ||
| block_size=block_size, | ||
| softmax_scale=self.scale, | ||
| inpt_layout="BSH" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block contains several critical errors that will cause it to fail at runtime:
- Line 714: Typo
hidden_szieshould behidden_size. - Line 715 & 727: Typo
batch_szieshould bebatch_size. - Line 716:
hidden_sizeis used but was defined with a typo ashidden_szie. - Line 720: Typo
torch_nupshould betorch_npu. - Line 723: The
valuevariable is not defined in this scope. It should likely be derived fromself.value_cachesimilar to howkeyis derived fromself.key_cache. - Line 724:
attn_metadatadoes not have an attributeseq_len. It should probably beseq_lens. - Line 730: Typo
inpt_layoutshould beinput_layout.
if is_A5():
batch_size = attn_metadata.query_lens.shape[0]
hidden_size = self.num_heads * self.head_size
query = query[:batch_size]
query = query.view(batch_size, 1, hidden_size)
block_size = self.key_cache.shape[1]
key = self.key_cache.flatten(2, 3).contiguous()
value = self.value_cache.flatten(2, 3).contiguous()
ori_output = output
output, _ = torch_npu.npu_fused_infer_attention_score_v2(
query=query,
key=key,
value=value,
actual_seq_kvlen=attn_metadata.seq_lens,
num_query_heads=self.num_heads,
num_key_value_heads=self.num_kv_heads,
block_table=attn_metadata.block_tables[:batch_size],
block_size=block_size,
softmax_scale=self.scale,
input_layout="BSH"
)| if is_A5(): | ||
| output, _ = torch_npu.npu_fused_infer_attention_score_v2( | ||
| query=query, | ||
| key=self.key_cache.flatten(2,3).contiguous(), | ||
| value=self.value_cache.flatten(2,3).contiguous(), | ||
| atten_mask=attn_metadata.attn_mask, | ||
| actual_seq_qlen=attn_metadata.actual_seq_lengths_q, | ||
| actual_seq_kvlen=attn_metadata.seq_lens_list, | ||
| num_query_heads=self.num_heads, | ||
| num_key_value_heads=self.num_kv_heads, | ||
| block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]], | ||
| block_size=self.key_cache.shape[1], | ||
| softmax_scale=self.scale, | ||
| imput_layout="TND" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues in this block:
- On line 805, there is an indentation error for
block_size, which will cause aSyntaxError. - On line 807,
imput_layoutis a typo and should beinput_layout.
if is_A5():
output, _ = torch_npu.npu_fused_infer_attention_score_v2(
query=query,
key=self.key_cache.flatten(2,3).contiguous(),
value=self.value_cache.flatten(2,3).contiguous(),
atten_mask=attn_metadata.attn_mask,
actual_seq_qlen=attn_metadata.actual_seq_lengths_q,
actual_seq_kvlen=attn_metadata.seq_lens_list,
num_query_heads=self.num_heads,
num_key_value_heads=self.num_kv_heads,
block_table=attn_metadata.block_tables[:attn_metadata.query_lens.shape[0]],
block_size=self.key_cache.shape[1],
softmax_scale=self.scale,
input_layout="TND"
)
vllm_ascend/ops/rotary_embedding.py
Outdated
| if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换 | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two critical issues here:
- The
is_A5function is not imported, which will lead to aNameError. - The function
forward_ootis expected to return a tuple(query, key). However, thisifblock has an earlyreturnwithout a value, which implicitly returnsNone. This will cause aTypeErrorwhen the caller tries to unpack the result. The comment indicates this is a placeholder, but it should at least return the originalqueryandkeyto avoid crashing.
| if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换 | |
| return | |
| if is_A5(): # A5不支持npu_mrope算子,这里需要使用小算子替换 | |
| return query, key |
| ) -> torch.Tensor: | ||
| # npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P | ||
| if not is_310p() and p is not None and k is not None and 1 <= int( | ||
| if not is_310p() and not is_A5() and p is not None and k is not None and 1 <= int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if is_A5(): | ||
| mas_seq_len = max(seq_lens, default=0) | ||
| max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size | ||
| new_element = torch.tensor([max_seq_len]) | ||
| seq_lens = torch.cat([seq_lens, new_element], dim =0) | ||
| return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block has several critical issues:
- The
is_A5function is not imported, which will cause aNameError. - On line 997, there is a typo
mas_seq_lenwhich should bemax_seq_len. - On line 998, there is a typo
self.block_sziewhich should beself.block_size. - On line 1000,
seq_lensis reassigned but not used afterwards in this block, making the operation ineffective.
| if is_A5(): | |
| mas_seq_len = max(seq_lens, default=0) | |
| max_seq_len = (max_seq_len + self.block_szie - 1) // self.block_size * self.block_size | |
| new_element = torch.tensor([max_seq_len]) | |
| seq_lens = torch.cat([seq_lens, new_element], dim =0) | |
| return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool) | |
| if is_A5(): | |
| max_seq_len = max(seq_lens.tolist(), default=0) | |
| max_seq_len = (max_seq_len + self.block_size - 1) // self.block_size * self.block_size | |
| return self.attn_mask_builder.get_attn_mask(max_seq_len, self.dtype, self.device).to(torch.bool) |
| if num_tokens <= self.mc2_tokens_capacity else | ||
| MoECommType.ALLTOALL) | ||
| elif soc_version in {AscendSocVersion.A5}: | ||
| if (num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_cross_dp >= 8 and is_mc2_models(model_type)) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
092c445 to
739002f
Compare
abc05bb to
9d054ef
Compare
| num_query_heads=self.num_heads, | ||
| num_key_value_heads=self.num_kv_heads, | ||
| softmax_scale=self.scale, | ||
| spare_mode=2, #spare_mode=2时,代表leftupCausal模式的mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove chinese characters and same below.
9d054ef to
76c6cce
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
76c6cce to
38beae1
Compare
…scend into br_a5_dev_1110
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?