[triton_kernels] Force matmul_ogs output to be TMA-compatible #8774

Maratyszcza · 2025-11-19T19:32:36Z

TMA requires innermost dimension to be contiguous and other dimensions to have strides that are multiples of 16 bytes. Previously, it wasn't enforced, and if output happened to have odd dimensions, trigerred an assertion when creating TMA descriptor later on. This change modifies apply_allocation to:

Pad the innermost dimension before allocation and then slice it if the output tensor was not provided
Verify the strides of the output tensor if it was provided

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

TMA requires innermost dimension to be contiguous and other dimensions to have strides that are multiples of 16 bytes. Previously, it wasn't enforced, and if output happened to have odd dimensions, trigerred an assertion when creating TMA descriptor later on. This change modifies apply_allocation to: - Pad the innermost dimension before allocation and then slice it if the output tensor was not provided - Verify the strides of the output tensor if it was provided

Maratyszcza requested a review from ptillet as a code owner November 19, 2025 19:32

Maratyszcza assigned ptillet Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[triton_kernels] Force matmul_ogs output to be TMA-compatible #8774

[triton_kernels] Force matmul_ogs output to be TMA-compatible #8774

Uh oh!

Maratyszcza commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[triton_kernels] Force matmul_ogs output to be TMA-compatible #8774

Are you sure you want to change the base?

[triton_kernels] Force matmul_ogs output to be TMA-compatible #8774

Uh oh!

Conversation

Maratyszcza commented Nov 19, 2025

New contributor declaration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants