[rocmlir-tuning-driver] Add rotating buffers and inline instruction cache flush #2188
+233
−60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
See https://github.com/ROCm/rocMLIR-internal/issues/2149 for motivation.
Technical Details
This PR removes calls to both
flushL2CacheandflushInstructionCache, replacing them with other approach:flushL2Cache: This PR implements rotating buffers. The idea is to allocate multiple buffers that are rotated on every iteration, with the goal of avoiding cache reuse from one iteration to another. We also have a new optionnum-rotating-buffersto control how many rotating buffers are used (default is 5).flushInstructionCache: This PR implementsinsertInstructionCacheFlush, which inserts the kernel containings_icache_invplusnops into the actual kernel that we want to executing, thus avoiding the launch overhead of theflushInstructionCachekernel. The compilation of the kernel is adapted to make a intermediate step at LLVM IR level, where we insert the assembly, which later get lowered to a binary.Test Plan
No new test was added.
Test Result
All test pass.
Submission Checklist