[doc] Add tutorial on TDs to readme #5541

Egor-Krivov · 2025-11-24T17:09:04Z

Add tutorial for Tensor Descriptors to the README.md

etiotto · 2025-11-25T15:39:02Z

.github/README.md

+
+Basic rules:
+
+1. **Use Tensor Descriptors:** For inputs and outputs of matmul operations (`tl.dot`), use Tensor Descriptors. This utilizes the hardware-optimized DPAS operation and asynchronous loading. You can often expect more than a 2x performance improvement compared to the basic tensor of pointers approach.


asyncronous loading -> 2D block IO HW operations to load the operand of a tt.dot operation

Also clarify that tensor descriptors declared in the kernel (device side) should be used (not tensor descriptors declared on the host).

etiotto · 2025-11-25T15:40:45Z

.github/README.md

+
+1. **Use Tensor Descriptors:** For inputs and outputs of matmul operations (`tl.dot`), use Tensor Descriptors. This utilizes the hardware-optimized DPAS operation and asynchronous loading. You can often expect more than a 2x performance improvement compared to the basic tensor of pointers approach.
+2. **Benchmark:** Experiment with the performance of your kernel. You can use `triton.testing.do_bench` for basic benchmarking, as demonstrated in the [tutorials](../python/tutorials/02-fused-softmax.py).
+3. **Type Annotations:** Use proper type annotations for your kernels. Good type annotations allow for better optimization, but be careful to avoid excessive recompilation.


hmmm, what does this mean. It may be misleading because adding an explicit type (e.g. tl.int64) to the strides used to create the tensor descriptor is going to cause us problems.

etiotto · 2025-11-25T15:46:33Z

.github/README.md

+---
+
+
+Tensor Descriptors support shapes up to 5 dimensions, but for performance, it is best to use 2 dimensions whenever possible.


Let's make this statement even more clear, change:

"but for performance, it is best to use 2 dimensions whenever possible."

to:

"however the Intel XPU backend currently optimizes well 2 dimensional tensor descriptors, so avoid using higher dimensionality tensor descriptors whenever possible."

etiotto · 2025-11-25T15:48:19Z

.github/README.md

+Summary:
+1. Use Tensor Desciptors to load memory reqired for `tl.dot` and to save results.
+2. Strive to use 2D tensor desctiptors for better performance.
+3. Last tensor stride should be `tl.constexpr` or have no type annotation. Annotating with `tl.int64` will result in poor perfomance.


not just tl.int64, using tl.int32 for example would also limit performance. :Let's suggest to use tl.constexpr whenever possible, and if not then avoid explicitly annotating the data type.

etiotto · 2025-11-25T15:49:19Z

.github/README.md

+## Use proper type annotations
+1. Set `tl.constexpr` type annotation for block sizes and boolean flags to let the compiler optimize. Each combination of arguments with this annotation is compiled separately. Avoid setting it for values that vary widely at runtime (like the number of tokens) to prevent excessive recompilation.
+2. No Annotation: You can keep type annotations empty and let the compiler guess. This is good for parameters that change often (like strides) to avoid recompilation.
+3. Avoid writing `tl.int64` type annotation for the last stride of a tensor. It is often important for the compiler to know that the tensor is contiguous.


Exactly, but generalize to avoid annotating the type explicitly (not just tl.int64)

etiotto · 2025-11-25T15:49:50Z

.github/README.md

+## Tune kernel configuration
+
+### GRF Mode
+Setting it higher can be good for kernel that uses many registers, but will decrease hardware utilizaion.


setting it higher --> using grf_mode = large

grf_mode = large is deprecated, please use grf_mode = 256 instead.

etiotto · 2025-11-25T15:50:11Z

.github/README.md

 ```

+### Example 3 : GEMM operations
+Intel backend for triton requires


Something else was supposed to go here ?

Egor-Krivov · 2025-11-25T16:04:17Z

@etiotto I'll process all of you comments and fix issues. In the meantime, maybe you have some other ideas about key things for performance that I missed?

Egor-Krivov added 2 commits November 24, 2025 17:05

Add TD tutorial

583c21c

Update

060b763

Egor-Krivov changed the title ~~[doc] Add readme tutorial~~ [doc] Add tutorial on TDs to readme Nov 24, 2025

Egor-Krivov requested review from etiotto and whitneywhtsang November 24, 2025 17:18

etiotto reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[doc] Add tutorial on TDs to readme #5541

[doc] Add tutorial on TDs to readme #5541

Egor-Krivov commented Nov 24, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

whitneywhtsang Nov 25, 2025

Uh oh!

etiotto Nov 25, 2025

Uh oh!

Egor-Krivov commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		Basic rules:

		1. Use Tensor Descriptors: For inputs and outputs of matmul operations (`tl.dot`), use Tensor Descriptors. This utilizes the hardware-optimized DPAS operation and asynchronous loading. You can often expect more than a 2x performance improvement compared to the basic tensor of pointers approach.

		---


		Tensor Descriptors support shapes up to 5 dimensions, but for performance, it is best to use 2 dimensions whenever possible.

[doc] Add tutorial on TDs to readme #5541

Are you sure you want to change the base?

[doc] Add tutorial on TDs to readme #5541

Conversation

Egor-Krivov commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Egor-Krivov commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Egor-Krivov commented Nov 25, 2025 •

edited

Loading