Skip to content

Conversation

@Maratyszcza
Copy link
Collaborator

TMA requires innermost dimension to be contiguous and other dimensions to have strides that are multiples of 16 bytes. Previously, it wasn't enforced, and if output happened to have odd dimensions, trigerred an assertion when creating TMA descriptor later on. This change modifies apply_allocation to:

  • Pad the innermost dimension before allocation and then slice it if the output tensor was not provided
  • Verify the strides of the output tensor if it was provided

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

TMA requires innermost dimension to be contiguous and other dimensions to have strides that are multiples of 16 bytes. Previously, it wasn't enforced, and if output happened to have odd dimensions, trigerred an assertion when creating TMA descriptor later on. This change modifies apply_allocation to:
- Pad the innermost dimension before allocation and then slice it if the output tensor was not provided
- Verify the strides of the output tensor if it was provided
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants