[refactor] apply() Overhead Reduction #121

YiyanZhai · 2025-12-01T19:20:37Z

This PR reduces the overhead of apply():

It simplified the create_pkg_name function to just use the solution name without hashing
The get_solution method was originally doing a linear search through all solutions, which is O(n) where n is the total number of solutions across all definitions. So this PR added a _solution_by_name index that maps solution names directly to Solution objects.
Removed repeated checks on cached solutions: Even when a solution is already cached in a builder, the registry still iterates through all builders calling can_build(). So this PR added a solution-to-builder mapping cache in the registry.

Minor naming refactor: The internal path for the cutlass dependency has been updated from _deps to thirdparty in both the CUDA builder configuration and pyproject.toml.

Summary by CodeRabbit

Refactor
- Optimized solution lookup performance through indexing and caching mechanisms.
- Reorganized internal dependency paths and simplified package naming logic.
- Streamlined runtime initialization and removed unused tracing infrastructure.
Chores
- Updated package configuration to reflect internal module restructuring.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-01T19:20:46Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Reorganized apply runtime singleton placement and removed tracing collection/import from the dispatch path; added per-solution builder cache in BuilderRegistry; introduced O(1) solution lookup in TraceSet; simplified package naming by removing source hash; and moved cutlass package path from flashinfer_bench._deps.cutlass to flashinfer_bench.thirdparty.cutlass.

Changes

Cohort / File(s)	Summary
Apply runtime & dispatch `flashinfer_bench/apply/runtime.py`	Relocated singleton initialization plumbing later in the module; removed tracing runtime import and removed tracing collection from dispatch flow.
Builder package naming `flashinfer_bench/compile/builder.py`	`create_pkg_name` no longer appends a hash; returns deterministic prefix + sanitized solution name.
Per-solution builder cache `flashinfer_bench/compile/registry.py`	Added `_solution_to_builder: dict[str, Builder]` to `BuilderRegistry`; `build()` checks and populates cache by `sol.name`; `clear()` clears the cache.
CUDA dependency path update `flashinfer_bench/compile/builders/cuda_builder.py`, `pyproject.toml`	Changed cutlass package key from `flashinfer_bench._deps.cutlass` to `flashinfer_bench.thirdparty.cutlass` in `CUDA_DEPS` and package-data.
TraceSet indexing `flashinfer_bench/data/trace_set.py`	Added `_solution_by_name` dict, `__post_init__` to populate it and detect duplicates, and updated `from_path`/`get_solution()` to use the index for O(1) lookup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Inspect flashinfer_bench/apply/runtime.py for subtle import-time behavior changes and confirm no callers relied on earlier eager initialization or tracing-side effects.
Verify BuilderRegistry cache correctness: collision semantics if multiple solutions share names, cache invalidation, and thread-safety if relevant.
Ensure TraceSet.__post_init__ runs for all construction paths (including deserialization/loading) and that mutation paths keep _solution_by_name synchronized.
Check packaging changes (pyproject) and cuda_builder path update for consistency with repository layout.

Possibly related PRs

fix: broken apply and imports #67 — Also modifies apply dispatch/tracing flow; likely related to coordination of tracing/dispatch changes.

Poem

🐰 I hopped through code with careful paws,

Moved runtimes, pruned tracing claws,
Builders cached to speed the race,
Solutions found in a single place,
Cutlass moved to thirdparty halls. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[refactor] apply() Overhead Reduction' directly and clearly summarizes the main objective of the PR—reducing overhead in the apply() function through optimizations like caching and index-based lookups.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c0d75b and 93ffe61.

📒 Files selected for processing (2)

flashinfer_bench/compile/registry.py (1 hunks)
flashinfer_bench/data/trace_set.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer_bench/compile/registry.py (8)

flashinfer_bench/compile/builder.py (4)

Builder (43-86)

clear_cache (78-86)

build (69-76)

can_build (50-52)

flashinfer_bench/compile/builders/cuda_builder.py (2)

clear_cache (235-239)

can_build (135-136)

flashinfer_bench/data/solution.py (1)

Solution (97-165)

flashinfer_bench/compile/runnable.py (1)

Runnable (9-38)

flashinfer_bench/integration/utils.py (1)

get (30-35)

flashinfer_bench/compile/builders/tvm_ffi_builder.py (1)

can_build (73-86)

flashinfer_bench/compile/builders/python_builder.py (1)

can_build (23-24)

flashinfer_bench/compile/builders/triton_builder.py (1)

can_build (31-32)

🪛 Ruff (0.14.6)

flashinfer_bench/compile/registry.py

25-26: try-except-pass detected, consider logging the exception

(S110)

25-25: Do not catch blind exception: Exception

(BLE001)

flashinfer_bench/data/trace_set.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
GitHub Check: Run unit tests on ubuntu-latest and Python 3.13

🔇 Additional comments (3)

flashinfer_bench/data/trace_set.py (2)

50-56: Good addition - addresses prior review concern.

The __post_init__ method now includes the duplicate solution name check that was suggested in the previous review. This prevents silent overwrites in the _solution_by_name index and ensures data integrity.

47-48: LGTM! Effective O(1) solution lookup optimization.

The _solution_by_name index is properly maintained in both __post_init__ and from_path, and get_solution now uses direct dictionary lookup instead of iterating through all solutions. This is a clean performance improvement aligned with the PR's overhead reduction objectives.

Also applies to: 144-144, 185-201

flashinfer_bench/compile/registry.py (1)

18-19: LGTM! Clean builder caching to avoid repeated can_build() checks.

The _solution_to_builder cache is well-integrated:

Properly initialized and cleared alongside builder caches

Checked before iterating through builders in build()

Populated on first successful match

This eliminates redundant can_build() iterations for previously-seen solutions, directly supporting the PR's overhead reduction goals.

Also applies to: 27-27, 30-36

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-01T19:20:52Z

Summary of Changes

Hello @YiyanZhai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces several key optimizations to reduce the overhead associated with the apply() function. The changes focus on improving performance by streamlining internal processes such as package name generation, accelerating solution lookups, and implementing caching mechanisms within the builder registry. Additionally, the ApplyRuntime has been refactored to simplify its responsibilities, contributing to a more efficient and maintainable codebase.

Highlights

Simplified Package Naming: The create_pkg_name function has been simplified by removing the hashing of source files, now relying solely on the solution name for package identification.
Optimized Solution Lookup: The TraceSet class now includes a _solution_by_name index, transforming the get_solution method from an O(n) linear search to an O(1) direct lookup, significantly improving retrieval speed.
Builder Registry Caching: A new cache (_solution_to_builder) has been added to the BuilderRegistry to store mappings between solution names and their respective builders, avoiding redundant can_build() checks and speeding up the build process.
Refactored ApplyRuntime: Tracing logic has been removed from the ApplyRuntime's dispatch method, streamlining its core responsibility and separating concerns.
Dependency Path Update: The internal path for the cutlass dependency has been updated from _deps to thirdparty in both the CUDA builder configuration and pyproject.toml.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several effective optimizations to reduce the overhead of the apply() function, aligning with the stated goals. The changes include simplifying create_pkg_name by removing expensive hashing, significantly improving get_solution performance by introducing an O(1) lookup index, and caching the solution-to-builder mapping to avoid redundant checks. These are all valuable performance enhancements. The implementation is well-executed, but I've identified one potential issue in flashinfer_bench/data/trace_set.py where duplicate solution names are not handled during direct TraceSet instantiation, which could lead to incorrect behavior. My review includes a specific suggestion to address this.

flashinfer_bench/data/trace_set.py

a check for duplicates is added Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer_bench/apply/runtime.py (1)
144-145: Return type annotation mismatch in __enter__.

The method returns self but the return type is annotated as None. This should be "ApplyRuntime" (or Self from typing_extensions).
-    def __enter__(self) -> None:
-        return self
+    def __enter__(self) -> "ApplyRuntime":
+        return self

🧹 Nitpick comments (2)

flashinfer_bench/apply/runtime.py (1)
152-160: Use explicit return None for clarity.

When fib_enable_apply is False, the function exits without an explicit return statement, implicitly returning None. While functionally correct, an explicit return None improves readability and matches the Optional["ApplyRuntime"] return type.
 def _init_apply_runtime_from_env() -> Optional["ApplyRuntime"]:
     """Initialize the global runtime from environment variables if configured."""
     fib_enable_apply = get_fib_enable_apply()
     if not fib_enable_apply:
-        return
+        return None
     fib_dataset_path = get_fib_dataset_path()
     trace_set = TraceSet.from_path(fib_dataset_path)
     apply_config = ApplyConfig()
     return ApplyRuntime(trace_set, apply_config, None)
flashinfer_bench/compile/registry.py (1)
21-27: Avoid try/except Exception: pass in clear(); add logging to surface cache-clear failures

clear() currently swallows all exceptions from b.clear_cache() without any visibility, making cache‑clear failures invisible and difficult to debug. Since this is best‑effort cleanup, add lightweight warning logging:
+import logging
+
+logger = logging.getLogger(__name__)
+
 class BuilderRegistry:
     def clear(self) -> None:
         for b in self._builders:
             try:
                 b.clear_cache()
             except Exception as exc:
-                pass
+                logger.warning("Failed to clear cache for %r: %s", b, exc)
         self._solution_to_builder.clear()
The module-level logger with lazy formatting (%r, %s) follows Python logging best practices for library code: it surfaces issues for debugging while remaining non-intrusive and allowing application-level log configuration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a7154c and 1c0d75b.

📒 Files selected for processing (6)

flashinfer_bench/apply/runtime.py (1 hunks)
flashinfer_bench/compile/builder.py (1 hunks)
flashinfer_bench/compile/builders/cuda_builder.py (1 hunks)
flashinfer_bench/compile/registry.py (1 hunks)
flashinfer_bench/data/trace_set.py (4 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

flashinfer_bench/data/trace_set.py (2)

web/apps/web/lib/schemas/trace.ts (1)

Solution (140-140)

flashinfer_bench/integration/utils.py (1)

get (30-35)

flashinfer_bench/compile/registry.py (7)

flashinfer_bench/compile/builder.py (4)

Builder (43-86)

clear_cache (78-86)

build (69-76)

can_build (50-52)

flashinfer_bench/compile/builders/cuda_builder.py (2)

clear_cache (235-239)

can_build (135-136)

flashinfer_bench/data/solution.py (1)

Solution (97-165)

flashinfer_bench/compile/runnable.py (1)

Runnable (9-38)

flashinfer_bench/compile/builders/tvm_ffi_builder.py (1)

can_build (73-86)

flashinfer_bench/compile/builders/python_builder.py (1)

can_build (23-24)

flashinfer_bench/compile/builders/triton_builder.py (1)

can_build (31-32)

flashinfer_bench/apply/runtime.py (3)

flashinfer_bench/env.py (2)

get_fib_enable_apply (7-16)

get_fib_dataset_path (31-43)

flashinfer_bench/data/trace_set.py (2)

TraceSet (23-481)

from_path (93-154)

flashinfer_bench/apply/config.py (1)

ApplyConfig (6-31)

🪛 Ruff (0.14.6)

flashinfer_bench/compile/registry.py

25-26: try-except-pass detected, consider logging the exception

(S110)

25-25: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (8)

flashinfer_bench/compile/builder.py (1)

32-36: Simplified package naming relies on solution name uniqueness.

The removal of source hashing is valid since TraceSet.from_path enforces unique solution names (raises ValueError on duplicates). This simplification improves cache key determinism and eliminates hashing overhead.

flashinfer_bench/compile/builders/cuda_builder.py (1)

73-73: LGTM!

The cutlass package path update is consistent with the corresponding change in pyproject.toml. The header-only configuration remains unchanged.

flashinfer_bench/apply/runtime.py (1)

163-163: Module-level initialization may have import-time side effects.

_init_apply_runtime_from_env() is invoked at module import, which reads environment variables, potentially loads trace data from disk, and installs FlashInfer integrations. If any of these operations fail or are slow, it affects all code that imports this module. Consider lazy initialization on first access to get_apply_runtime() if import-time overhead becomes problematic.

flashinfer_bench/data/trace_set.py (3)

47-54: Good addition of O(1) solution lookup index.

The _solution_by_name index with __post_init__ initialization is a clean approach. Note that direct mutations to self.solutions (e.g., appending to lists) won't update the index, but this is acceptable for the current read-heavy usage pattern. If a public add_solution() method is added later, it should also update _solution_by_name.

142-142: Index population during from_path is correctly placed.

This ensures the index is populated as solutions are loaded, maintaining consistency with the __post_init__ approach for direct construction.

183-199: O(1) lookup via index is a nice performance improvement.

The docstring accurately reflects the new O(1) behavior. This directly supports the PR's goal of reducing overhead in the apply() path.

pyproject.toml (1)

61-61: Path update looks consistent with cuda_builder.py changes.

The package data path change from flashinfer_bench._deps.cutlass to flashinfer_bench.thirdparty.cutlass aligns with the corresponding update in cuda_builder.py. Ensure the flashinfer_bench/thirdparty/cutlass/ directory structure exists with the expected include/** files.

flashinfer_bench/compile/registry.py (1)

18-20: Builder cache by sol.name looks good; just be aware of the stability assumptions

Using a solution_name -> Builder cache is a solid way to eliminate repeated can_build scans, and keying by Solution.name aligns with the uniqueness guarantee on that field. The main behavioral change is that subsequent calls to build() for the same sol.name will not re-run builder.can_build(sol) and will always reuse the previously selected builder.

This assumes that for a given solution name:

can_build results are effectively stable over the lifetime of the registry, and

the solution's characteristics that influence can_build don't change.

If those assumptions hold for your builders, this is fine. If you expect dynamic availability changes, consider optionally guarding the cache hit with a cheap re-check to preserve dynamic behavior at the cost of a single can_build in edge cases.

…lashinfer-bench into overhead-improvement

YiyanZhai added 2 commits December 1, 2025 13:45

reduce apply overhead

b3d8e21

debug

1c0d75b

gemini-code-assist bot reviewed Dec 1, 2025

View reviewed changes

flashinfer_bench/data/trace_set.py Show resolved Hide resolved

Update flashinfer_bench/data/trace_set.py

d100efa

a check for duplicates is added Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

YiyanZhai added 2 commits December 1, 2025 14:24

linting

e5cf521

Merge branch 'overhead-improvement' of https://github.com/YiyanZhai/f…

93ffe61

…lashinfer-bench into overhead-improvement

YiyanZhai self-assigned this Dec 1, 2025

YiyanZhai requested a review from Ubospica December 1, 2025 19:30

YiyanZhai mentioned this pull request Dec 1, 2025

[Tracking] Reduce apply() overhead & improve Adapter usability #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] apply() Overhead Reduction #121

[refactor] apply() Overhead Reduction #121

Uh oh!

YiyanZhai commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 1, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Dec 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[refactor] apply() Overhead Reduction #121

Are you sure you want to change the base?

[refactor] apply() Overhead Reduction #121

Uh oh!

Conversation

YiyanZhai commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Dec 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YiyanZhai commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading