Add update_tensor_descriptor operation to Triton/Gluon #8786

alexsamardzic · 2025-11-20T17:38:08Z

This PR is to add update_tensor_descriptor operation, to simplify writing kernels like grouped MM (like pytorch/pytorch#166063, in particular to avoid handling special cases like this). This is also to match update_tensormap op in CuTe DSL, like used here.

The operation reads the existing descriptor from GMEM into SMEM, performs updates in SMEM, and writes updated descriptor back into GMEM. So the rationale for using this operation instead of creating a new descriptor is to save a GMEM allocation (more precisely, to trade it for reading from GMEM), and to emit only tensormap.replace.tile.* PTX instructions for the descriptor fields that are actually changed. Otherwise, the implementation closely follows make_tensor_descriptor implementation. The end-to-end performance improvement is minor, but the main advantage is that the code for the cases like the kernel pointed above is cleaner. This PR makes it possible to change tensor base pointer, shape and strides fields in the descriptor; changing other fields could be added in the future if there is a need.

include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUOps.td

peterbell10 · 2025-11-20T19:20:15Z

include/triton/Dialect/Triton/IR/TritonOps.td

+//
+// Update Tensor Descriptor Op
+//
+def TT_UpdateTensorDescOp : TT_Op<"update_tensor_descriptor", [


The in-place update kind of forces the underlying implementation to be memory-backed which may not be the case for all hardware, including pre-hopper where we translate tensor descriptors to normal pointer indexing.

There are several options to fix it, for example lowering basically to what make_tensor_descriptor does for older hardware. But overall: I'm actually not sure having this operation in Triton is appropriate, so how about removing the Triton version, and moving Gluon version under tma or better hopper namespace?

Added a commit that removes Triton version of the operation from PR, and moves Gluon version under tma namespace.

peterbell10 · 2025-11-20T19:30:12Z

python/test/unit/language/test_tensor_descriptor.py

+
+    a = desc.load([moffset, noffset])
+
+    tl.update_tensor_descriptor(desc, base=b_ptr)


I think this is illegal. We're passing the descriptor in param space which should be constant.

I also think this will break if your launch grid is larger than the number of SMs. I expect the second program to be scheduled on a single SM would see the already updated descriptor.

Uhm, indeed. Would it be acceptable to limit the operation to work only for the descriptors created from within the kernel, and thus avoid both problems you pointed to?

Added another commit to implement the change proposed.

alexsamardzic requested review from Jokeren, peterbell10 and ptillet as code owners November 20, 2025 17:38

alexsamardzic mentioned this pull request Nov 20, 2025

Add Gluon based grouped MM kernel for Blackwell into Inductor pytorch/pytorch#166063

Draft

peterbell10 reviewed Nov 20, 2025

View reviewed changes

alexsamardzic force-pushed the add-update-tensor-descriptor branch 2 times, most recently from 0ff51d0 to f4f5db2 Compare November 23, 2025 16:43

alexsamardzic added 3 commits November 24, 2025 12:22

Add update_tensor_descriptor operation to Triton/Gluon

633684f

Removed Triton operation and Gluon operation put under tma namespace

6998268

Make the operation work only on in-kernel created descriptors

00f04d7

alexsamardzic force-pushed the add-update-tensor-descriptor branch from f4f5db2 to 00f04d7 Compare November 24, 2025 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add update_tensor_descriptor operation to Triton/Gluon #8786

Add update_tensor_descriptor operation to Triton/Gluon #8786

Uh oh!

alexsamardzic commented Nov 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

peterbell10 Nov 20, 2025 •

edited

Loading

Uh oh!

alexsamardzic Nov 20, 2025 •

edited

Loading

Uh oh!

alexsamardzic Nov 21, 2025

Uh oh!

peterbell10 Nov 20, 2025

Uh oh!

alexsamardzic Nov 20, 2025

Uh oh!

alexsamardzic Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		a = desc.load([moffset, noffset])

		tl.update_tensor_descriptor(desc, base=b_ptr)

Add update_tensor_descriptor operation to Triton/Gluon #8786

Are you sure you want to change the base?

Add update_tensor_descriptor operation to Triton/Gluon #8786

Uh oh!

Conversation

alexsamardzic commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

peterbell10 Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

peterbell10 Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexsamardzic commented Nov 20, 2025 •

edited

Loading

peterbell10 Nov 20, 2025 •

edited

Loading

alexsamardzic Nov 20, 2025 •

edited

Loading