Avoid using block_m=64 when having mxfp8 downcast in epilogue #8830

qnie-oai · 2025-11-25T05:38:00Z

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

pytest ./python/triton_kernels/tests/test_matmul.py -rs -vv -k "True-False-True-False-None-128-720-576-768-ragged-mxfloat8_e4m3fn-mxfloat4_e2m1-10-1-True-None-False-False-False-True-swiglu_opt"

failed with numerics gap with ref

pytest ./python/triton_kernels/tests/test_matmul.py -rs -vv -k "True-False-True-False-None-128-768-512-1024-ragged-mxfloat8_e4m3fn-mxfloat4_e2m1-10-1-True-None-False-False-False-True-swiglu_opts"

passed test

lezcano

Making the block larger should increase the shmem usage, so this change on its own does not make sense to me.
Perhaps the right change is to change the shmem computation we do when setting num_stages?

qnie-oai force-pushed the mxfp8-out-block-128 branch from a37606b to f24fddc Compare November 25, 2025 05:41

use block 128

af96539

qnie-oai force-pushed the mxfp8-out-block-128 branch from f24fddc to af96539 Compare November 25, 2025 07:59

lezcano reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid using block_m=64 when having mxfp8 downcast in epilogue #8830

Avoid using block_m=64 when having mxfp8 downcast in epilogue #8830

qnie-oai commented Nov 25, 2025 •

edited

Loading

Uh oh!

lezcano left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid using block_m=64 when having mxfp8 downcast in epilogue #8830

Are you sure you want to change the base?

Avoid using block_m=64 when having mxfp8 downcast in epilogue #8830

Conversation

qnie-oai commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New contributor declaration

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qnie-oai commented Nov 25, 2025 •

edited

Loading