Skip to content

Conversation

@bartekxk
Copy link
Contributor

@bartekxk bartekxk commented Nov 7, 2025

Proposed changes

[CK BUILDER] Add grouped conv fwd ck tile traits

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@bartekxk
Copy link
Contributor Author

bartekxk commented Nov 7, 2025

TODO: update more tests

@bartekxk bartekxk requested a review from Copilot November 7, 2025 23:28
@shumway shumway changed the title [CK BUILDER] Add grouped conv fwd ck tile traits [CK_BUILDER] Add grouped conv fwd ck tile traits Nov 7, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ck_tile namespace to use a distinct constant address space macro, adds pipeline name methods for better introspection, and introduces instance traits support for the tile-based grouped convolution forward kernel.

  • Renamed CK_CONSTANT_ADDRESS_SPACE to CK_TILE_CONSTANT_ADDRESS_SPACE throughout ck_tile namespace to distinguish from the ck namespace
  • Added GetPipelineName() static methods to all GEMM pipeline classes returning simplified pipeline identifiers
  • Implemented InstanceTraits specialization for GroupedConvolutionForwardKernel with corresponding test coverage

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
include/ck_tile/core/arch/arch.hpp Renamed macro to CK_TILE_CONSTANT_ADDRESS_SPACE and updated function signatures
include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp Updated to use CK_TILE_CONSTANT_ADDRESS_SPACE macro
include/ck_tile/ops/gemm_quant/kernel/grouped_gemm_quant_kernel.hpp Updated to use CK_TILE_CONSTANT_ADDRESS_SPACE macro
include/ck_tile/ops/gemm/pipeline/*.hpp Added GetPipelineName() methods returning pipeline identifiers (BASIC_V1, BASIC_V2, COMPUTE_V3-V6, COMPUTE_ASYNC, MEMORY, PRESHUFFLE_V2)
include/ck_tile/ops/grouped_convolution/kernel/grouped_convolution_forward_kernel.hpp Added GetInstanceString() method under CK_EXPERIMENTAL_BUILDER flag
experimental/builder/include/ck_tile/builder/reflect/instance_traits_tile_grouped_convolution_forward.hpp New file implementing InstanceTraits specialization for tile-based grouped convolution forward kernel
experimental/builder/include/ck_tile/builder/reflect/instance_traits_util.hpp Extended utility functions to support ck_tile types (bf16_t, fp8_t, bf8_t, half_t, tuple, GemmPipelineScheduler)
experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp Added includes for ck_tile ops headers
experimental/builder/include/ck_tile/builder/reflect/instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp Added missing include for instance_traits_util.hpp
experimental/builder/test/test_fwd_instance_traits.cpp Added comprehensive test for tile-based grouped convolution instance string generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 124 to 127
oss << "," << GemmPipeline::GetPipelineName(); // 21. BlkGemmPipelineVer
oss << "," << detail::pipeline_scheduler_name(kPipelineScheduler); // 24. BlkGemmPipeSched
oss << "," << kDoubleSmemBuffer; // 25. NumWaveGroups
oss << "," << kNumWaveGroups; // 26. NumWaveGroups
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comments incorrectly number and describe the parameters. Line 122 shows "22. ADataType" and line 123 shows "23. BDataType", but line 124 shows "21. BlkGemmPipelineVer" which breaks the sequential order. Additionally, line 126 is labeled "25. NumWaveGroups" but outputs kDoubleSmemBuffer, while line 127 is labeled "26. NumWaveGroups" and outputs kNumWaveGroups. The comment on line 126 should be "25. DoubleSmemBuffer" to accurately reflect what's being output.

Suggested change
oss << "," << GemmPipeline::GetPipelineName(); // 21. BlkGemmPipelineVer
oss << "," << detail::pipeline_scheduler_name(kPipelineScheduler); // 24. BlkGemmPipeSched
oss << "," << kDoubleSmemBuffer; // 25. NumWaveGroups
oss << "," << kNumWaveGroups; // 26. NumWaveGroups
oss << "," << GemmPipeline::GetPipelineName(); // 24. BlkGemmPipelineVer
oss << "," << detail::pipeline_scheduler_name(kPipelineScheduler); // 25. BlkGemmPipeSched
oss << "," << kDoubleSmemBuffer; // 26. DoubleSmemBuffer
oss << "," << kNumWaveGroups; // 27. NumWaveGroups

Copilot uses AI. Check for mistakes.
"Specialization of instance_traits not found. Please check that a "
"specialization exists in file "
"ck_tile/builder/reflect/"
"instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp "
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references the wrong file name. It mentions instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp but should reference instance_traits_tile_grouped_convolution_forward.hpp instead, which is the actual file included at line 20 and contains the specialization for GroupedConvolutionForwardKernel.

Suggested change
"instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp "
"instance_traits_tile_grouped_convolution_forward.hpp "

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,141 @@
// SPDX-License-Identifier: MIT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: New copyright format shouldn't contain the year.

shumway
shumway previously approved these changes Nov 7, 2025
Copy link
Collaborator

@shumway shumway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great to get a CK-Tile kernel in. It's a good check on our design, especially seeing the difference in InstanceTraits. There's duplication in the tuple_name function, but I don't see an way to generalize it to reduce duplication (likely there is some template matching that might work). This looks like a good way to move forward with CK Tile.

@bartekxk
Copy link
Contributor Author

bartekxk commented Nov 8, 2025

Thanks, this is great to get a CK-Tile kernel in. It's a good check on our design, especially seeing the difference in InstanceTraits. There's duplication in the tuple_name function, but I don't see an way to generalize it to reduce duplication (likely there is some template matching that might work). This looks like a good way to move forward with CK Tile.

Yes, I wondered if we should generalize it and print "Tuple" for the both or print "Tuple" for old ck and "tuple" for ck tile.

@shumway shumway merged commit 92c1f49 into develop Nov 11, 2025
49 checks passed
@shumway shumway deleted the barkocot/tile-builder-fwd branch November 11, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants