Skip to content

Conversation

@qnie-oai
Copy link

@qnie-oai qnie-oai commented Nov 25, 2025

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)
pytest ./python/triton_kernels/tests/test_matmul.py -rs -vv -k "True-False-True-False-None-128-720-576-768-ragged-mxfloat8_e4m3fn-mxfloat4_e2m1-10-1-True-None-False-False-False-True-swiglu_opt"
  • failed with numerics gap with ref
pytest ./python/triton_kernels/tests/test_matmul.py -rs -vv -k "True-False-True-False-None-128-768-512-1024-ragged-mxfloat8_e4m3fn-mxfloat4_e2m1-10-1-True-None-False-False-False-True-swiglu_opts"
  • passed test

@qnie-oai qnie-oai force-pushed the mxfp8-out-block-128 branch from a37606b to f24fddc Compare November 25, 2025 05:41
@qnie-oai qnie-oai force-pushed the mxfp8-out-block-128 branch from f24fddc to af96539 Compare November 25, 2025 07:59
Copy link
Contributor

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the block larger should increase the shmem usage, so this change on its own does not make sense to me.
Perhaps the right change is to change the shmem computation we do when setting num_stages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants