Skip to content

Conversation

@dshi7
Copy link
Contributor

@dshi7 dshi7 commented Nov 16, 2025

Upstream CLC backend changes from following PRs. It works well with TLX frontend (see unit test and tutorial kernel). This PR contains the backend part only.

TTGIR

  • include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUOps.td

PTX lowering

  • third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/BarrierOpToLLVM.cpp

MLIR

  • test/Conversion/tritonnvidiagpu_to_llvm.mlir

Test
lit test/Conversion/tritonnvidiagpu_to_llvm.mlir

make test-cpp

otherwise, it hits following errors:

******************** TEST 'TRITON :: Conversion/tritonnvidiagpu_to_llvm.mlir' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts | FileCheck /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir
+ /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts
+ FileCheck /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir
triton-opt: /data/users/daohang/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/PTXAsmFormat.cpp:157: PTXInstrExecution &mlir::triton::PTXInstrCommon::call(ArrayRef<Operand *>, bool): Assertion `builder->executions.empty() && "builder can only hold a single execution when onlyAttachMIIRArgs " "is true."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
Stack dump:
0.      Program arguments: /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts
 #0 0x0000000007210ea8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /data/users/daohang/triton/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
 triton-lang#1 0x000000000720ea83 llvm::sys::RunSignalHandlers() /data/users/daohang/triton/llvm-project/llvm/lib/Support/Signals.cpp:105:18
 triton-lang#2 0x0000000007211c41 SignalHandler(int, siginfo_t*, void*) /data/users/daohang/triton/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
 triton-lang#3 0x00007fd6b583fc30 __restore_rt (/lib64/libc.so.6+0x3fc30)
 triton-lang#4 0x00007fd6b588d03c __pthread_kill_implementation (/lib64/libc.so.6+0x8d03c)
 triton-lang#5 0x00007fd6b583fb86 gsignal (/lib64/libc.so.6+0x3fb86)
 triton-lang#6 0x00007fd6b5829873 abort (/lib64/libc.so.6+0x29873)
 triton-lang#7 0x00007fd6b582979b _nl_load_domain.cold (/lib64/libc.so.6+0x2979b)
 triton-lang#8 0x00007fd6b58388c6 (/lib64/libc.so.6+0x388c6)
 triton-lang#9 0x00000000031dad20 mlir::triton::PTXInstrCommon::operator()(llvm::ArrayRef<mlir::triton::PTXBuilder::Operand*>, bool) /data/users/daohang/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/PTXAsmFormat.cpp:169:10
/data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir:91:18: error: CHECK-LABEL: expected string not found in input
 // CHECK-LABEL: async_clc_query_cancel
                 ^
<stdin>:56:33: note: scanning from here
 llvm.func @async_clc_try_cancel(%arg0: !llvm.struct<(ptr<3>, i32)>, %arg1: !llvm.struct<(ptr<3>, i32)>, %arg2: !llvm.ptr<1>, %arg3: !llvm.ptr<1>) attributes {nvvm.kernel = 1 : ui1, nvvm.reqntid = array<i32: 128>} {
                                ^
<stdin>:62:384: note: possible intended match here
 %5 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "\0A {\0A .reg .u32 first_cta_in_cluster;\0A .reg .pred pred_first_cta_in_cluster;\0A .reg .pred pred_issue;\0A mov.u32 first_cta_in_cluster, %cluster_ctaid.x;\0A setp.u32.eq pred_first_cta_in_cluster, first_cta_in_cluster, 0x0;\0A and.pred pred_issue, $2, pred_first_cta_in_cluster;\0A @pred_issue clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [$0], [$1];\0A }\0A ", "r,r,b" %arg1, %arg0, %4 : (!llvm.struct<(ptr<3>, i32)>, !llvm.struct<(ptr<3>, i32)>, i1) -> !llvm.void
@dshi7 dshi7 requested a review from ptillet as a code owner November 16, 2025 19:13
@dshi7 dshi7 force-pushed the daohang/clc_backend branch from 8b99a8a to cb7bf7f Compare November 17, 2025 00:12
@dshi7
Copy link
Contributor Author

dshi7 commented Nov 17, 2025

broken ci complains about HIP OOM and should be false positive

@ThomasRaoux
Copy link
Collaborator

Would it be possible to stack the PRs needed until the point where we can have a Gluon execution test? That would help us understand the scope of the feature.
I don't have a good sense of how much work it is, otherwise maybe having a branch somewhere would help.

Comment on lines +287 to +289
mov.u32 first_cta_in_cluster, %cluster_ctaid.x;
setp.u32.eq pred_first_cta_in_cluster, first_cta_in_cluster, 0x0;
and.pred pred_issue, $2, pred_first_cta_in_cluster;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate this out of the inline ptx, this will allow the code sequence to be optimized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, happy to do that. would you elaborate more about to what extent I should separate this out? asking because I was basically following the same style in ArriveBarrierOpConversion. more context or an existing example would be even better.

.reg .b128 clc_result;
.reg .pred p1;
mov.s32 $0, -1;
ld.shared.b128 clc_result, [$1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, can we separate this out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants