Fix for moe on sm110 #2190

jhalabi-nv · 2025-12-09T02:34:09Z

📌 Description

SM100 configs are generated for SM110 devices, but of the valid MOE kernels for SM110, only TMA Epilogue Scheduling is supported. In the generated configs, EpilogueScheduleType::AUTO is not handled well in the getDispatchFunctionForSM100 dispatch function (it is never resolved to a concrete value, either TMA or NO_SMEM), so all configs are rejected.

This PR pins the generated configs to EpilogueScheduleType::TMA for SM110 devices. It also allows SM100 configs to run on SM110 devices.

Future work would include generating SM110 configs specifically for SM110 devices, rather than rely on SM100 configs.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Fixed GPU configuration validation to support latest hardware generations.
Refactor
- Optimized kernel execution scheduling for improved performance on newer devices.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-09T02:34:14Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The changes update SM110 CUTLASS kernel dispatch logic by switching epilogue scheduling from AUTO to TMA strategy and relaxing SM version compatibility checks to allow SM100 configurations to execute on SM110 devices.

Changes

Cohort / File(s)	Summary
Epilogue Schedule Type Update `csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp`	Modified `get_candidate_configs_sm110` to use `EpilogueScheduleType::TMA` instead of `AUTO` in FAST_BUILD path and per-tile config construction for CutlassGemmConfig instances.
SM Version Compatibility Relaxation `csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h`	Relaxed SM version validation in `MoeGemmRunner::dispatchToArch` to allow SM100 configurations to run on SM110 devices by accepting both exact major/minor matches and a fallback condition.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Changes are confined to two specific functions with straightforward parameter updates and conditional logic modifications
SM architecture compatibility logic is localized and easy to verify
No complex branching or multi-file dependencies introduced
Primary concern: ensure TMA scheduling is appropriate for SM110 and SM100/110 compatibility doesn't introduce correctness issues

Possibly related PRs

Feature: Add support for L40 FusedMoE in cutlass path #1973 — Modifies the same kernel dispatch files (cutlass_heuristic.cpp and moe_gemm_template_dispatch.h) to adjust SM-specific tile/config selection and version checks.

Suggested reviewers

djmmoss
cyx-6
wenscarl
bkryu
yzh119

Poem

🐰 TMA schedules now bloom, so bright and keen,
On SM110's stage, a GPU dream,
With backward compat, SM100 hops along,
The kernels dance faster—our code grows strong! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix for moe on sm110' directly addresses the main change—fixing MOE kernel execution on SM110 devices by adjusting epilogue scheduling and validation logic.
Description check	✅ Passed	The description includes a clear explanation of the problem, the solution, and notes on future work. It covers the main sections but lacks content in the Related Issues and Reviewer Notes sections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-09T02:34:24Z

Summary of Changes

Hello @jhalabi-nv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces critical fixes and optimizations for Mixture of Experts (MoE) General Matrix Multiply (GEMM) operations specifically targeting SM110 architectures. The changes focus on enhancing performance by utilizing Tensor Memory Accelerator (TMA) for epilogue scheduling, improving compatibility by allowing SM100 configurations to run on SM110, and refining the kernel dispatch logic to ensure SM110 devices leverage the most appropriate specialized kernels for efficient execution.

Highlights

Epilogue Scheduling Optimization: Updated the get_candidate_configs_sm110 function to explicitly use EpilogueScheduleType::TMA for all CutlassGemmConfig instances, including the fast build configuration, for SM110 architectures. This change likely leverages Tensor Memory Accelerator (TMA) for improved performance in the epilogue phase of GEMM operations.
SM100/SM110 Compatibility: Modified the moe_gemm_template_dispatch.h file to allow configurations designed for SM100 devices to run on SM110 devices. This ensures broader compatibility and flexibility for MoE GEMM operations across these architectures.
Refined Dispatch Logic for SM110: Adjusted the dispatchMoeGemmFinalDispatchTmaWarpSpecialized function in moe_gemm_template_dispatch_tma_ws.h to correctly categorize SM110 within the compute capability checks. This change ensures that SM110 devices are routed to the appropriate specialized kernel dispatch paths, separating them from SM100 and grouping them with SM90 or SM120+ for certain kernel selections.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to add support for MoE kernels on the SM110 architecture. The changes involve updating kernel candidate configurations and modifying dispatch logic. Specifically, it allows SM100 kernel configurations to be used on SM110 devices and adjusts compile-time kernel selection for SM110. My review has identified a significant logical inconsistency between the runtime configuration generation and the compile-time kernel specializations for SM110, which results in SM110-specific code paths being unreachable. I've provided comments with high and medium severity to highlight these issues and suggest a more consistent approach for SM110 support.

gemini-code-assist · 2025-12-09T02:37:19Z

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h

+        TLLM_CHECK_WITH_INFO(
+            (inputs.gemm_config.sm_version / 10 == sm_ / 10) ||
+            // allow sm100 configs to run on sm110 as well
+            (inputs.gemm_config.sm_version / 10 == 10 && sm_ / 10 == 11),
+            "Using SM %d configuration for SM %d device",
+            inputs.gemm_config.sm_version, sm_);


There appears to be a logical inconsistency in how SM110 support is being added. This change allows SM100-kernels to run on SM110 devices, which is necessary because get_candidate_configs_sm110 in cutlass_heuristic.cpp generates configurations with sm_version=100 instead of 110.

However, other changes in moe_gemm_template_dispatch_tma_ws.h introduce compile-time logic for SM110-specific kernels. These kernels will never be dispatched because the runtime configuration will always be for SM100.

This makes the SM110-specific code paths dead code and the overall approach confusing. A more robust solution would be:

Update get_candidate_configs_sm110 to generate configs with sm_version=110.

Add a dispatch case for sm_version=110 in dispatchMoeGemmSelectTileShapeTmaWarpSpecialized.

This would make this explicit check for SM100 on SM110 unnecessary.

If the intention is to use SM100 kernels on SM110 for now, the changes for SM110-specific kernels in moe_gemm_template_dispatch_tma_ws.h should probably be in a separate, future PR to avoid confusion.

gemini-code-assist · 2025-12-09T02:37:19Z

...v_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h

    }

-    if constexpr (Arch::kMinComputeCapability >= 100 && Arch::kMinComputeCapability < 120) {
+    if constexpr (Arch::kMinComputeCapability >= 100 && Arch::kMinComputeCapability < 110) {


This change, along with the one on line 207, creates a specific compile-time path for SM110. However, as noted in my main comment on moe_gemm_template_dispatch.h, the current runtime logic seems configured to use SM100 kernels on SM110 devices. This makes this new SM110-specific path unreachable. If these changes are preparatory for full SM110 kernel support in a future update, it might be better to introduce them then to avoid having dead code in the repository.

gemini-code-assist · 2025-12-09T02:37:19Z

...v_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h

      selected_func(hopper_input, num_experts, multi_processor_count, stream, occupancy,
                    workspace_size, cluster_shape_cute, cluster_shape_cute_fallback);
-    } else if constexpr (Arch::kMinComputeCapability >= 120 || Arch::kMinComputeCapability == 90) {
+    } else if constexpr (Arch::kMinComputeCapability >= 120 || Arch::kMinComputeCapability == 90 || Arch::kMinComputeCapability == 110) {


Similar to the change on line 179, this adds SM110 to this kernel dispatch path. As mentioned in my other comments, this path appears to be unreachable with the current configuration generation logic, which defaults to using SM100 kernels on SM110 hardware. This might be dead code until the configuration generation and dispatch logic are updated to handle sm_version=110.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h (1)
797-803: Consider improving clarity of the SM version compatibility check.

The added condition correctly allows SM100 configs to run on SM110 devices as an interim solution. However, the magic numbers and division logic could be more explicit.

Consider extracting the major version logic into named constants or a helper function:
// At file or namespace scope
constexpr int SM_MAJOR_VERSION_100 = 10;
constexpr int SM_MAJOR_VERSION_110 = 11;

inline int getSmMajorVersion(int sm_version) {
  return sm_version / 10;
}

// In the check:
TLLM_CHECK_WITH_INFO(
    (getSmMajorVersion(inputs.gemm_config.sm_version) == getSmMajorVersion(sm_)) ||
    // Allow SM100 configs to run on SM110 as interim solution
    (getSmMajorVersion(inputs.gemm_config.sm_version) == SM_MAJOR_VERSION_100 && 
     getSmMajorVersion(sm_) == SM_MAJOR_VERSION_110),
    "Using SM %d configuration for SM %d device",
    inputs.gemm_config.sm_version, sm_);
Also, verify that the TMA epilogue schedule pinning in cutlass_heuristic.cpp properly coordinates with this dispatch logic:
#!/bin/bash
# Verify the flow from config generation to dispatch for SM110
rg -n "get_candidate_configs_sm110|EpilogueScheduleType::TMA" --type=cpp -A3 -B3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fe01a2 and fa5886d.

📒 Files selected for processing (2)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (2 hunks)
csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-12T03:35:17.583Z

Learnt from: raayandhar
Repo: flashinfer-ai/flashinfer PR: 2070
File: include/flashinfer/gemm/bf16_gemm_cutlass_template.h:145-160
Timestamp: 2025-11-12T03:35:17.583Z
Learning: In flashinfer GEMM implementations (e.g., include/flashinfer/gemm/bf16_gemm_cutlass_template.h, fp8_gemm_cutlass_template.h), it is acceptable to catch and silently ignore std::runtime_error exceptions in getWorkspaceSizeImpl when probing multiple GEMM configurations, as some configurations may legitimately fail due to SMEM constraints. This pattern should include a comment like "// Swallow errors when SMEM exceeds maximum allowed" to document the rationale.

Applied to files:

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h

🔇 Additional comments (2)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (2)

572-580: LGTM! Consistent TMA epilogue schedule applied to all SM110 configs.

The change ensures all generated SM110 configurations use EpilogueScheduleType::TMA instead of AUTO, which is consistent with the FAST_BUILD path at line 536 and addresses the issue where AUTO wasn't being resolved by the dispatch function.

533-537: Confirm TMA is the only valid epilogue schedule for SM110 MOE kernels.

The change from EpilogueScheduleType::AUTO to EpilogueScheduleType::TMA pins the epilogue schedule for FAST_BUILD on SM110. Please verify that TMA is indeed the only supported epilogue schedule for SM110 MOE kernels by checking the SM110-specific dispatch logic and any constraints documented in the codebase.

yzh119 · 2025-12-10T22:58:48Z

/bot run

yzh119

LGTM, should be ready to merge once gitlab CI passed.

flashinfer-bot · 2025-12-10T22:59:11Z

GitLab MR !188 has been created, and the CI pipeline #39986560 is currently running. I'll report back once the pipeline job completes.

Fix for moe on sm110

d027074

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

jhalabi-nv added 2 commits December 9, 2025 16:37

Revert changes in dispatch function

80429f7

Clang-format edits

fa5886d

jhalabi-nv marked this pull request as ready for review December 10, 2025 01:48

jhalabi-nv requested review from djmmoss, wenscarl, yongwww and yzh119 as code owners December 10, 2025 01:48

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

aleozlx approved these changes Dec 10, 2025

View reviewed changes

yzh119 approved these changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for moe on sm110 #2190

Fix for moe on sm110 #2190

jhalabi-nv commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

yzh119 commented Dec 10, 2025

Uh oh!

yzh119 left a comment

Uh oh!

flashinfer-bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix for moe on sm110 #2190

Are you sure you want to change the base?

Fix for moe on sm110 #2190

Conversation

jhalabi-nv commented Dec 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Dec 10, 2025

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

flashinfer-bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jhalabi-nv commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2025 •

edited

Loading