-
Notifications
You must be signed in to change notification settings - Fork 2.4k
CLC (cluster launch control) backend #8740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
otherwise, it hits following errors: ******************** TEST 'TRITON :: Conversion/tritonnvidiagpu_to_llvm.mlir' FAILED ******************** Exit Code: 1 Command Output (stderr): -- RUN: at line 1: /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts | FileCheck /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir + /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts + FileCheck /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir triton-opt: /data/users/daohang/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/PTXAsmFormat.cpp:157: PTXInstrExecution &mlir::triton::PTXInstrCommon::call(ArrayRef<Operand *>, bool): Assertion `builder->executions.empty() && "builder can only hold a single execution when onlyAttachMIIRArgs " "is true."' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. Stack dump: 0. Program arguments: /data/users/daohang/triton/build/cmake.linux-x86_64-cpython-3.13/bin/triton-opt /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir -split-input-file --convert-triton-gpu-to-llvm=compute-capability=90 -reconcile-unrealized-casts #0 0x0000000007210ea8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /data/users/daohang/triton/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 triton-lang#1 0x000000000720ea83 llvm::sys::RunSignalHandlers() /data/users/daohang/triton/llvm-project/llvm/lib/Support/Signals.cpp:105:18 triton-lang#2 0x0000000007211c41 SignalHandler(int, siginfo_t*, void*) /data/users/daohang/triton/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 triton-lang#3 0x00007fd6b583fc30 __restore_rt (/lib64/libc.so.6+0x3fc30) triton-lang#4 0x00007fd6b588d03c __pthread_kill_implementation (/lib64/libc.so.6+0x8d03c) triton-lang#5 0x00007fd6b583fb86 gsignal (/lib64/libc.so.6+0x3fb86) triton-lang#6 0x00007fd6b5829873 abort (/lib64/libc.so.6+0x29873) triton-lang#7 0x00007fd6b582979b _nl_load_domain.cold (/lib64/libc.so.6+0x2979b) triton-lang#8 0x00007fd6b58388c6 (/lib64/libc.so.6+0x388c6) triton-lang#9 0x00000000031dad20 mlir::triton::PTXInstrCommon::operator()(llvm::ArrayRef<mlir::triton::PTXBuilder::Operand*>, bool) /data/users/daohang/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/PTXAsmFormat.cpp:169:10 /data/users/daohang/triton/test/Conversion/tritonnvidiagpu_to_llvm.mlir:91:18: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: async_clc_query_cancel ^ <stdin>:56:33: note: scanning from here llvm.func @async_clc_try_cancel(%arg0: !llvm.struct<(ptr<3>, i32)>, %arg1: !llvm.struct<(ptr<3>, i32)>, %arg2: !llvm.ptr<1>, %arg3: !llvm.ptr<1>) attributes {nvvm.kernel = 1 : ui1, nvvm.reqntid = array<i32: 128>} { ^ <stdin>:62:384: note: possible intended match here %5 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "\0A {\0A .reg .u32 first_cta_in_cluster;\0A .reg .pred pred_first_cta_in_cluster;\0A .reg .pred pred_issue;\0A mov.u32 first_cta_in_cluster, %cluster_ctaid.x;\0A setp.u32.eq pred_first_cta_in_cluster, first_cta_in_cluster, 0x0;\0A and.pred pred_issue, $2, pred_first_cta_in_cluster;\0A @pred_issue clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [$0], [$1];\0A }\0A ", "r,r,b" %arg1, %arg0, %4 : (!llvm.struct<(ptr<3>, i32)>, !llvm.struct<(ptr<3>, i32)>, i1) -> !llvm.void
8b99a8a to
cb7bf7f
Compare
|
broken ci complains about HIP OOM and should be false positive |
|
Would it be possible to stack the PRs needed until the point where we can have a Gluon execution test? That would help us understand the scope of the feature. |
| mov.u32 first_cta_in_cluster, %cluster_ctaid.x; | ||
| setp.u32.eq pred_first_cta_in_cluster, first_cta_in_cluster, 0x0; | ||
| and.pred pred_issue, $2, pred_first_cta_in_cluster; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate this out of the inline ptx, this will allow the code sequence to be optimized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, happy to do that. would you elaborate more about to what extent I should separate this out? asking because I was basically following the same style in ArriveBarrierOpConversion. more context or an existing example would be even better.
| .reg .b128 clc_result; | ||
| .reg .pred p1; | ||
| mov.s32 $0, -1; | ||
| ld.shared.b128 clc_result, [$1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, can we separate this out?
Upstream CLC backend changes from following PRs. It works well with TLX frontend (see unit test and tutorial kernel). This PR contains the backend part only.
TTGIR
PTX lowering
MLIR
Test
lit test/Conversion/tritonnvidiagpu_to_llvm.mlirmake test-cpp