[OPS] support triton causal_conv1d_fn ops #4119

QilaiZhang · 2025-11-11T07:44:57Z

What this PR does / why we need it?

Support triton causal_conv1d_fn ops.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-11T07:45:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a Triton-based implementation for causal_conv1d_fn to optimize its performance on Ascend hardware. The changes include a new Triton kernel, an update to the function wrapper, and the addition of a comprehensive test suite with a PyTorch reference implementation for validation. The filename casual_conv1d.py has also been corrected. The overall implementation is good, but I've identified a performance issue in the Triton kernel related to a redundant memory load.

vllm_ascend/ops/causal_conv1d.py

weijinqian0 · 2025-11-11T08:48:07Z

Some performance tests need to be conducted.

weijinqian0 · 2025-11-11T08:49:40Z

vllm_ascend/ops/causal_conv1d.py

+                                           1)[mask].to(torch.int32)
+
+    BLOCK_N = 256
+    grid = (total_seq_blocks, triton.cdiv(dim, BLOCK_N))


This parameter may need to be tuned to achieve better performance.

QilaiZhang · 2025-11-13T02:46:50Z

@weijinqian0 @wxsIcey Regarding parameter tuning: Since the batch_ptr and token_chunk_offset_ptr of this operator need to be calculated based on BLOCK_M and BLOCK_N, it is not suitable for triton.autotune. I manually tested the performance under different inputs and parameters using the msprof op tool, and the results are shown in the figure below.

For an input length of 64, the optimal combination is (32, 256); for a length of 256, the optimal combination is (64, 256); and for a length of 8192, the optimal combination is (128, 512). However, when BLOCK_M is greater than or equal to 128, the compilation time is long and the kernel often hangs or crashes during testing, so this option is not considered.

Taking both performance and stability into account, I decided not to use branch statements to dynamically select parameters. Instead, the parameters are directly set to (64, 256). This rationale is consistent with the implementation of this operator on NVIDIA GPUs, which also hardcodes its parameters to (8, 256). We can revisit this for further optimization in the future when the triton ascend is more stable.

QilaiZhang · 2025-11-13T02:56:02Z

Regarding performance testing: Since the operator executes during the prefill stage, I set the concurrency to 40 with an input size of 64 and output size of 10 to better demonstrate the performance gains. The test script used is as follows.

python3 benchmarks/benchmark_serving.py --backend openai-chat --base-url ${BASE_URL} \
--endpoint=/v1/chat/completions --model /model/Qwen3-next-80B-A3B-Thinking/ --served-model-name Qwen3-next-80B-A3B-Thinking \
--dataset-name sonnet --dataset-path benchmarks/sonnet.txt --max-concurrency 40 \
--sonnet-input-len 64 --sonnet-output-len 10 --sonnet-prefix-len 0 --num-prompts 400 \
--ignore-eos --percentile-metrics "ttft,tpot,itl,e2el"

The results are summarized below:

PyTorch implementation
Triton kernel (before tuning) with (8, 256)
Triton kernel (after tuning) with (64, 256)

Signed-off-by: QilaiZhang <[email protected]>

QilaiZhang · 2025-11-14T09:31:28Z

@wxsIcey The current tests have passed. We are ready for the next test when convenient.

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

vllm_ascend/ops/causal_conv1d.py Outdated Show resolved Hide resolved

github-actions bot added module:tests module:ops labels Nov 11, 2025

weijinqian0 reviewed Nov 11, 2025

View reviewed changes

QilaiZhang force-pushed the causal_conv1d_fwd branch 3 times, most recently from 01a550f to 9380e82 Compare November 13, 2025 02:40

QilaiZhang force-pushed the causal_conv1d_fwd branch from 9380e82 to 0358477 Compare November 13, 2025 09:30

QilaiZhang added 2 commits November 14, 2025 14:38

Fix Typo: causal_conv1d.py

ba3fec4

Signed-off-by: QilaiZhang <[email protected]>

[OPS] support triton causal_conv1d_fn ops

f1404e0

Signed-off-by: QilaiZhang <[email protected]>

QilaiZhang force-pushed the causal_conv1d_fwd branch from 0358477 to f1404e0 Compare November 14, 2025 06:51

wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 14, 2025

QilaiZhang closed this Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OPS] support triton causal_conv1d_fn ops #4119

[OPS] support triton causal_conv1d_fn ops #4119

Uh oh!

QilaiZhang commented Nov 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

weijinqian0 commented Nov 11, 2025

Uh oh!

weijinqian0 Nov 11, 2025

Uh oh!

QilaiZhang commented Nov 13, 2025 •

edited

Loading

Uh oh!

QilaiZhang commented Nov 13, 2025 •

edited

Loading

Uh oh!

QilaiZhang commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[OPS] support triton causal_conv1d_fn ops #4119

[OPS] support triton causal_conv1d_fn ops #4119

Uh oh!

Conversation

QilaiZhang commented Nov 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

weijinqian0 commented Nov 11, 2025

Uh oh!

weijinqian0 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

QilaiZhang commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QilaiZhang commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QilaiZhang commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QilaiZhang commented Nov 11, 2025 •

edited by github-actions bot

Loading

QilaiZhang commented Nov 13, 2025 •

edited

Loading

QilaiZhang commented Nov 13, 2025 •

edited

Loading