Skip to content

Conversation

@CongMa13
Copy link
Collaborator

@CongMa13 CongMa13 commented Nov 7, 2025

  • Split cpp file to reduce building time
  • Support multiple GemmConfig

Proposed changes

The Gemm quant example faces two problems:

  • Very long compile times.
  • GemmConfig is hard-coded, which means you have to change the source and recompile to test different configurations.

In this PR, we'll split different instances into separate .cpp files to reduce compile time, and support multiple GemmConfig options selectable via command-line arguments.

  • Move all template functions to run_gemm_quant_example.inc
  • Add new cpp files to create gemm instances
  • Add new command line options data_type, quant_mode, preshuffleb, group_size to select gemm instance
  • Update main()
    • Print help
    • Select GPU device

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@CongMa13 CongMa13 requested a review from Copilot November 7, 2025 17:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the block scale GEMM quantization example code to improve maintainability and modularity. The main change is reorganizing the code from a monolithic implementation into a factory pattern with separate compilation units for different quantization configurations.

Key changes:

  • Refactored code to use a lookup table (LUT) with factory functions for different quantization type combinations
  • Moved the gemm_calc_quant function from gemm_quant_basic.cpp to run_gemm_quant_example.inc for reuse
  • Changed run_gemm_example_with_layouts to accept ArgParser directly instead of argc/argv

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
run_gemm_quant_example.inc Added gemm_calc_quant function and run_gemm_example_prec_type helper; modified run_gemm_example_with_layouts to accept ArgParser
gemm_utils.hpp Added hash_multiple_strings function; removed create_args function
gemm_quant.cpp New main entry point with factory pattern using LUT to dispatch to specific implementations
gemm_quant_basic.cpp Deleted - functionality moved to other files
gemm_bquant_quantgourped_*.cpp New factory implementation files for different BQuant data type combinations
gemm_aquant_quantgrouped.cpp New factory implementation for AQuant configurations
CMakeLists.txt Updated to compile new modular structure instead of single file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@CongMa13 CongMa13 requested review from a team and ddembeckAMD as code owners November 7, 2025 17:40
@CongMa13 CongMa13 force-pushed the congma/ck_tile/split_quant_gemm_example branch from f952017 to 9debcc1 Compare November 10, 2025 18:01
@ThomasNing ThomasNing dismissed stale reviews from AviralGoelAMD and spolifroni-amd November 11, 2025 00:37

Review Addressed

ThomasNing
ThomasNing previously approved these changes Nov 11, 2025
- Split cpp file to reduce building time
- Support multiple GemmConfig
- Add support for rowcol and tensor GEMM operations
@CongMa13 CongMa13 force-pushed the congma/ck_tile/split_quant_gemm_example branch from 9debcc1 to d0c9a38 Compare November 11, 2025 18:34
spolifroni-amd
spolifroni-amd previously approved these changes Nov 11, 2025
- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.
@ThomasNing ThomasNing merged commit 6fd8dda into develop Nov 13, 2025
22 checks passed
@ThomasNing ThomasNing deleted the congma/ck_tile/split_quant_gemm_example branch November 13, 2025 07:43
pmaybank pushed a commit that referenced this pull request Nov 13, 2025
* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants