[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len #4099

Angazenn · 2025-11-10T11:46:11Z

What this PR does / why we need it?

This is cherry-pick from #4097 .
Currently, we set seq_lens in dummy attn_metadata to be max_model_len to get max workspace for attention during capturing.
However, setting it consistently to be max_model_len causing dummy_run to execute a long attention when running actual inference. For example, if there is a single req with seqs_lens as [8] but max_model_len is 131072, the whole process will be slow down by dummy_run as it execute a fake long-seq attention. Therefore, we instead set it to max_query_len, which is also consistent with vLLM gpu implementation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

gemini-code-assist

Code Review

This pull request aims to optimize dummy attention runs by reducing seq_lens to 1 when not capturing a graph. The change correctly uses the force_attention flag to distinguish between capturing and other dummy runs. However, the accompanying comment is confusing and contradicts the code's logic, which could lead to future maintenance issues. I've suggested an updated comment to accurately reflect the implementation.

vllm_ascend/worker/model_runner_v1.py

github-actions · 2025-11-10T11:51:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: Angazenn <[email protected]>

Angazenn force-pushed the dummy_dev branch from 9597c14 to bb7a5a5 Compare November 10, 2025 11:47

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Outdated Show resolved Hide resolved

Angazenn changed the title ~~[cherry-pick][v0.11.0-dev][bugfix] Reduce seq_lens in dummy attn_metadata to be 1 when not capturing~~ [cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len Nov 11, 2025

yiz-liu approved these changes Nov 12, 2025

View reviewed changes

Angazenn added 2 commits November 12, 2025 17:57

bugfix

976b585

Signed-off-by: Angazenn <[email protected]>

change to max_query_len

ba9e61e

Signed-off-by: Angazenn <[email protected]>

Angazenn force-pushed the dummy_dev branch from ded2dde to ba9e61e Compare November 12, 2025 09:57

yiz-liu merged commit 28a1529 into vllm-project:v0.11.0-dev Nov 12, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len #4099

[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len #4099

Uh oh!

Angazenn commented Nov 10, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len #4099

[cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len #4099

Uh oh!

Conversation

Angazenn commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Angazenn commented Nov 10, 2025 •

edited

Loading