Skip to content

Conversation

@pabloantoniom
Copy link
Contributor

Motivation

See https://github.com/ROCm/rocMLIR-internal/issues/2149 for motivation.

Technical Details

This PR removes calls to both flushL2Cache and flushInstructionCache, replacing them with other approach:

  • For flushL2Cache: This PR implements rotating buffers. The idea is to allocate multiple buffers that are rotated on every iteration, with the goal of avoiding cache reuse from one iteration to another. We also have a new option num-rotating-buffers to control how many rotating buffers are used (default is 5).
  • For flushInstructionCache: This PR implements insertInstructionCacheFlush, which inserts the kernel containing s_icache_inv plus nops into the actual kernel that we want to executing, thus avoiding the launch overhead of the flushInstructionCache kernel. The compilation of the kernel is adapted to make a intermediate step at LLVM IR level, where we insert the assembly, which later get lowered to a binary.

Test Plan

No new test was added.

Test Result

All test pass.

Submission Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants