Skip to content

Commit 59a4988

Browse files
Fix incorrect reduction for small kernels
Ensures num_items is recalculated after updating group_width when num_items is less than max_sg_sz, preventing incorrect parallelism configuration.
1 parent b44682c commit 59a4988

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

src/ATen/native/xpu/sycl/Reduce.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,10 @@ struct ReduceConfig {
259259
num_items = group_width * group_height;
260260

261261
if (num_items < max_sg_sz)
262+
{
262263
group_width = max_sg_sz;
264+
num_items = group_width * group_height;
265+
}
263266
}
264267

265268
int split_input(int parallelism) {

0 commit comments

Comments
 (0)