-
Notifications
You must be signed in to change notification settings - Fork 34
Enable prefetching of SW kernel instructions after the first SW task #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
@Kepontry hello! Thanks for your PR! Could you please ensure your changes include test coverage by adding tests to https://github.com/openvinotoolkit/npu_compiler/blob/develop/tests/lit/NPU/dialect/VPUIP/passes/add_sw_kernel_instruction_prefetch_40XX.mlir and maybe some functional tests? |
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
@Kepontry could you please look into this comment? Thank you! |
|
Apologies for the delay; I missed the email notification for this thread. I am currently working on adding the test. Could you provide some guidance or documentation on how to use the lit test framework within the NPU compiler? |
|
Functional test added. |
|
Hi @Kepontry! Please adhere to the clang-format guidelines. You will find the automatically fixed code style in the job logs: https://github.com/openvinotoolkit/npu_compiler/actions/runs/19637113134/job/56230572067?pr=199 I also noticed that some LIT tests failed. Could you please verify if these failures are due to your changes? |
|
Hi @Maxim-Doronin , the failure of LIT test is caused by the
|
Hello @Kepontry! Thanks for adding the tests and sharing your findings regarding pre-commit failures! |
|
@Kepontry I managed to reproduce the issue: -> Will research it a bit and get back! In the meantime, could you please share with us why you set this particular value? |
|
Hi, @DariaMityagina , thanks for your assistance regarding this issue. Since this PR enables prefetching regardless of the first SHAVE task's start time threshold, it exposed some existing bugs in certain test cases. These bugs were previously hidden because there wasn't enough time slack to trigger the prefetch logic. I adjusted the threshold to simulate a scenario that forces prefetch insertion, confirming that these test cases fail without this PR. |
|
Hi @DariaMityagina , I have fixed the failed tests. The root cause was that I fixed this by manually adding the |
Thanks a lot for the updates! |
|
The failed log indicates an
I suspect this issue is related to the recently inserted MLIR code. I am currently uncertain whether the root cause involves the specific values (1326182 or 1473536) or the reserved memory allocation. Since the code runs on both NPU37XX and NPU40XX, I made modifications according to the implementation found in feasible_allocation.mlir. npu_compiler/tests/lit/NPU/dialect/VPUIP/passes/feasible_allocation.mlir Lines 123 to 129 in 9f8730d
I hope this resolves the issue.Alternatively, could you provide the scripts necessary to reproduce this experiment in the CI environment? |
|
Fixed the failing tests on NPU37XX by adjusting the offset of the reserved memory. Similar changes were applied to the NPU40XX tests. Note: I'm currently uncertain why the |
|
Hi @DariaMityagina , the prefetching for the |
|
Hi @DariaMityagina , since |
Hello @Kepontry! Great! Thank you! |
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
src/vpux_compiler/src/dialect/VPUIP/transforms/passes/add_sw_kernel_instruction_prefetch.cpp
Outdated
Show resolved
Hide resolved
tests/lit/NPU/dialect/VPUIP/passes/add_sw_kernel_instruction_prefetch_mid_execution_40XX.mlir
Outdated
Show resolved
Hide resolved
|
Hi, @DariaMityagina , I refactored the code as requested. I moved the variables into the class and added the comments. Thanks for pointing these out. |
Hello @Kepontry! Thank you! |
Summary
This PR enhances the AddSwKernelInstructionPrefetchPass to enable prefetching of SHAVE kernel instructions after the first SHAVE task, if the initial slack is insufficient.
Currently, instruction prefetching is skipped if the slack before the first SHAVE task is insufficient. This limitation is suboptimal when initial insertion slots (tiles) are limited or L2 cache capacity is constrained.
Based on the observation that SHAVE utilization is often low, we propose this change to prefetch opportunistically later in the schedule. This approach has demonstrated a ~3% performance gain on models such as Qwen2-1.5b and Qwen3-0.6b.
Target Platform For Release Notes
Classification of this Pull Request
Implementation Details
Additional Fixes & Enhancements
We also noted that the previous 250K-cycle threshold is overly conservative for our platform (Ultra 258V). Our analysis shows that prefetching provides benefits even with a slack as low as 170K cycles.