[CK TILE GEMM] Refactor block_scale_gemm examples #3181

CongMa13 · 2025-11-07T16:10:41Z

Split cpp file to reduce building time
Support multiple GemmConfig

Proposed changes

The Gemm quant example faces two problems:

Very long compile times.
GemmConfig is hard-coded, which means you have to change the source and recompile to test different configurations.

In this PR, we'll split different instances into separate .cpp files to reduce compile time, and support multiple GemmConfig options selectable via command-line arguments.

Move all template functions to run_gemm_quant_example.inc
Add new cpp files to create gemm instances
Add new command line options data_type, quant_mode, preshuffleb, group_size to select gemm instance
Update main()
- Print help
- Select GPU device

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

example/ck_tile/38_block_scale_gemm/gemm_utils.hpp

Copilot

Pull Request Overview

This PR refactors the block scale GEMM quantization example code to improve maintainability and modularity. The main change is reorganizing the code from a monolithic implementation into a factory pattern with separate compilation units for different quantization configurations.

Key changes:

Refactored code to use a lookup table (LUT) with factory functions for different quantization type combinations
Moved the gemm_calc_quant function from gemm_quant_basic.cpp to run_gemm_quant_example.inc for reuse
Changed run_gemm_example_with_layouts to accept ArgParser directly instead of argc/argv

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
run_gemm_quant_example.inc	Added `gemm_calc_quant` function and `run_gemm_example_prec_type` helper; modified `run_gemm_example_with_layouts` to accept `ArgParser`
gemm_utils.hpp	Added `hash_multiple_strings` function; removed `create_args` function
gemm_quant.cpp	New main entry point with factory pattern using LUT to dispatch to specific implementations
gemm_quant_basic.cpp	Deleted - functionality moved to other files
gemm_bquant_quantgourped_*.cpp	New factory implementation files for different BQuant data type combinations
gemm_aquant_quantgrouped.cpp	New factory implementation for AQuant configurations
CMakeLists.txt	Updated to compile new modular structure instead of single file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

example/ck_tile/38_block_scale_gemm/run_gemm_quant_example.inc

example/ck_tile/38_block_scale_gemm/gemm_quant.cpp

example/ck_tile/38_block_scale_gemm/gemm_bquant_quantgrouped_preshuffleb_prefill.cpp

example/ck_tile/38_block_scale_gemm/run_gemm_quant_example.inc

example/ck_tile/38_block_scale_gemm/CMakeLists.txt

example/ck_tile/38_block_scale_gemm/README.md

Review Addressed

- Split cpp file to reduce building time - Support multiple GemmConfig

- Update Readme

- Add support for rowcol and tensor GEMM operations

- Update README

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

example/ck_tile/38_block_scale_gemm/gemm_aquant_quantgrouped.cpp

* [CK TILE GEMM] Refactor block_scale_gemm examples - Split cpp file to reduce building time - Support multiple GemmConfig * [CK TILE GEMM] Refactor block_scale_gemm examples - Update Readme * [CK TILE GEMM] Refactor block_scale_gemm examples - Add support for rowcol and tensor GEMM operations * [CK TILE GEMM] Refactor block_scale_gemm examples - Update README * [CK TILE GEMM] Refactor block_scale_gemm examples - Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

CongMa13 requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners November 7, 2025 16:10

AviralGoelAMD previously requested changes Nov 7, 2025

View reviewed changes

example/ck_tile/38_block_scale_gemm/gemm_utils.hpp Show resolved Hide resolved

CongMa13 requested a review from Copilot November 7, 2025 17:03

Copilot AI reviewed Nov 7, 2025

View reviewed changes

CongMa13 requested review from a team and ddembeckAMD as code owners November 7, 2025 17:40

spolifroni-amd previously requested changes Nov 7, 2025

View reviewed changes

example/ck_tile/38_block_scale_gemm/README.md Outdated Show resolved Hide resolved

example/ck_tile/38_block_scale_gemm/README.md Outdated Show resolved Hide resolved

example/ck_tile/38_block_scale_gemm/README.md Outdated Show resolved Hide resolved

CongMa13 force-pushed the congma/ck_tile/split_quant_gemm_example branch from f952017 to 9debcc1 Compare November 10, 2025 18:01

ThomasNing previously approved these changes Nov 11, 2025

View reviewed changes

AviralGoelAMD mentioned this pull request Nov 11, 2025

feat(block_scale_gemm): Support RRR & CRR layout for aquant quant mode #3193

Open

7 tasks

CongMa13 added 3 commits November 11, 2025 12:58

[CK TILE GEMM] Refactor block_scale_gemm examples

c9a1777

- Split cpp file to reduce building time - Support multiple GemmConfig

[CK TILE GEMM] Refactor block_scale_gemm examples

71c16cc

- Update Readme

[CK TILE GEMM] Refactor block_scale_gemm examples

3028c4f

- Add support for rowcol and tensor GEMM operations

[CK TILE GEMM] Refactor block_scale_gemm examples

d0c9a38

- Update README

CongMa13 dismissed ThomasNing’s stale review via d0c9a38 November 11, 2025 18:34

CongMa13 force-pushed the congma/ck_tile/split_quant_gemm_example branch from 9debcc1 to d0c9a38 Compare November 11, 2025 18:34

spolifroni-amd previously approved these changes Nov 11, 2025

View reviewed changes

[CK TILE GEMM] Refactor block_scale_gemm examples

3636d8d

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

CongMa13 dismissed spolifroni-amd’s stale review via 3636d8d November 12, 2025 19:01

AviralGoelAMD approved these changes Nov 12, 2025

View reviewed changes

example/ck_tile/38_block_scale_gemm/gemm_aquant_quantgrouped.cpp Outdated Show resolved Hide resolved

ThomasNing approved these changes Nov 12, 2025

View reviewed changes

ThomasNing merged commit 6fd8dda into develop Nov 13, 2025
22 checks passed

ThomasNing deleted the congma/ck_tile/split_quant_gemm_example branch November 13, 2025 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK TILE GEMM] Refactor block_scale_gemm examples #3181

[CK TILE GEMM] Refactor block_scale_gemm examples #3181

Uh oh!

CongMa13 commented Nov 7, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[CK TILE GEMM] Refactor block_scale_gemm examples #3181

[CK TILE GEMM] Refactor block_scale_gemm examples #3181

Uh oh!

Conversation

CongMa13 commented Nov 7, 2025

Proposed changes

Checklist

Discussion

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants