-
Notifications
You must be signed in to change notification settings - Fork 13
[refactor] apply() Overhead Reduction #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughReorganized apply runtime singleton placement and removed tracing collection/import from the dispatch path; added per-solution builder cache in BuilderRegistry; introduced O(1) solution lookup in TraceSet; simplified package naming by removing source hash; and moved cutlass package path from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (1)flashinfer_bench/compile/registry.py (8)
🪛 Ruff (0.14.6)flashinfer_bench/compile/registry.py25-26: (S110) 25-25: Do not catch blind exception: (BLE001) flashinfer_bench/data/trace_set.py55-55: Avoid specifying long messages outside the exception class (TRY003) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @YiyanZhai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces several key optimizations to reduce the overhead associated with the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several effective optimizations to reduce the overhead of the apply() function, aligning with the stated goals. The changes include simplifying create_pkg_name by removing expensive hashing, significantly improving get_solution performance by introducing an O(1) lookup index, and caching the solution-to-builder mapping to avoid redundant checks. These are all valuable performance enhancements. The implementation is well-executed, but I've identified one potential issue in flashinfer_bench/data/trace_set.py where duplicate solution names are not handled during direct TraceSet instantiation, which could lead to incorrect behavior. My review includes a specific suggestion to address this.
a check for duplicates is added Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer_bench/apply/runtime.py (1)
144-145: Return type annotation mismatch in__enter__.The method returns
selfbut the return type is annotated asNone. This should be"ApplyRuntime"(orSelffromtyping_extensions).- def __enter__(self) -> None: - return self + def __enter__(self) -> "ApplyRuntime": + return self
🧹 Nitpick comments (2)
flashinfer_bench/apply/runtime.py (1)
152-160: Use explicitreturn Nonefor clarity.When
fib_enable_applyisFalse, the function exits without an explicit return statement, implicitly returningNone. While functionally correct, an explicitreturn Noneimproves readability and matches theOptional["ApplyRuntime"]return type.def _init_apply_runtime_from_env() -> Optional["ApplyRuntime"]: """Initialize the global runtime from environment variables if configured.""" fib_enable_apply = get_fib_enable_apply() if not fib_enable_apply: - return + return None fib_dataset_path = get_fib_dataset_path() trace_set = TraceSet.from_path(fib_dataset_path) apply_config = ApplyConfig() return ApplyRuntime(trace_set, apply_config, None)flashinfer_bench/compile/registry.py (1)
21-27: Avoidtry/except Exception: passinclear(); add logging to surface cache-clear failures
clear()currently swallows all exceptions fromb.clear_cache()without any visibility, making cache‑clear failures invisible and difficult to debug. Since this is best‑effort cleanup, add lightweight warning logging:+import logging + +logger = logging.getLogger(__name__) + class BuilderRegistry: def clear(self) -> None: for b in self._builders: try: b.clear_cache() except Exception as exc: - pass + logger.warning("Failed to clear cache for %r: %s", b, exc) self._solution_to_builder.clear()The module-level logger with lazy formatting (
%r,%s) follows Python logging best practices for library code: it surfaces issues for debugging while remaining non-intrusive and allowing application-level log configuration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
flashinfer_bench/apply/runtime.py(1 hunks)flashinfer_bench/compile/builder.py(1 hunks)flashinfer_bench/compile/builders/cuda_builder.py(1 hunks)flashinfer_bench/compile/registry.py(1 hunks)flashinfer_bench/data/trace_set.py(4 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
flashinfer_bench/data/trace_set.py (2)
web/apps/web/lib/schemas/trace.ts (1)
Solution(140-140)flashinfer_bench/integration/utils.py (1)
get(30-35)
flashinfer_bench/compile/registry.py (7)
flashinfer_bench/compile/builder.py (4)
Builder(43-86)clear_cache(78-86)build(69-76)can_build(50-52)flashinfer_bench/compile/builders/cuda_builder.py (2)
clear_cache(235-239)can_build(135-136)flashinfer_bench/data/solution.py (1)
Solution(97-165)flashinfer_bench/compile/runnable.py (1)
Runnable(9-38)flashinfer_bench/compile/builders/tvm_ffi_builder.py (1)
can_build(73-86)flashinfer_bench/compile/builders/python_builder.py (1)
can_build(23-24)flashinfer_bench/compile/builders/triton_builder.py (1)
can_build(31-32)
flashinfer_bench/apply/runtime.py (3)
flashinfer_bench/env.py (2)
get_fib_enable_apply(7-16)get_fib_dataset_path(31-43)flashinfer_bench/data/trace_set.py (2)
TraceSet(23-481)from_path(93-154)flashinfer_bench/apply/config.py (1)
ApplyConfig(6-31)
🪛 Ruff (0.14.6)
flashinfer_bench/compile/registry.py
25-26: try-except-pass detected, consider logging the exception
(S110)
25-25: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (8)
flashinfer_bench/compile/builder.py (1)
32-36: Simplified package naming relies on solution name uniqueness.The removal of source hashing is valid since
TraceSet.from_pathenforces unique solution names (raisesValueErroron duplicates). This simplification improves cache key determinism and eliminates hashing overhead.flashinfer_bench/compile/builders/cuda_builder.py (1)
73-73: LGTM!The cutlass package path update is consistent with the corresponding change in
pyproject.toml. The header-only configuration remains unchanged.flashinfer_bench/apply/runtime.py (1)
163-163: Module-level initialization may have import-time side effects.
_init_apply_runtime_from_env()is invoked at module import, which reads environment variables, potentially loads trace data from disk, and installs FlashInfer integrations. If any of these operations fail or are slow, it affects all code that imports this module. Consider lazy initialization on first access toget_apply_runtime()if import-time overhead becomes problematic.flashinfer_bench/data/trace_set.py (3)
47-54: Good addition of O(1) solution lookup index.The
_solution_by_nameindex with__post_init__initialization is a clean approach. Note that direct mutations toself.solutions(e.g., appending to lists) won't update the index, but this is acceptable for the current read-heavy usage pattern. If a publicadd_solution()method is added later, it should also update_solution_by_name.
142-142: Index population duringfrom_pathis correctly placed.This ensures the index is populated as solutions are loaded, maintaining consistency with the
__post_init__approach for direct construction.
183-199: O(1) lookup via index is a nice performance improvement.The docstring accurately reflects the new O(1) behavior. This directly supports the PR's goal of reducing overhead in the
apply()path.pyproject.toml (1)
61-61: Path update looks consistent with cuda_builder.py changes.The package data path change from
flashinfer_bench._deps.cutlasstoflashinfer_bench.thirdparty.cutlassaligns with the corresponding update incuda_builder.py. Ensure theflashinfer_bench/thirdparty/cutlass/directory structure exists with the expectedinclude/**files.flashinfer_bench/compile/registry.py (1)
18-20: Builder cache bysol.namelooks good; just be aware of the stability assumptionsUsing a
solution_name -> Buildercache is a solid way to eliminate repeatedcan_buildscans, and keying bySolution.namealigns with the uniqueness guarantee on that field. The main behavioral change is that subsequent calls tobuild()for the samesol.namewill not re-runbuilder.can_build(sol)and will always reuse the previously selected builder.This assumes that for a given solution name:
can_buildresults are effectively stable over the lifetime of the registry, and- the solution's characteristics that influence
can_builddon't change.If those assumptions hold for your builders, this is fine. If you expect dynamic availability changes, consider optionally guarding the cache hit with a cheap re-check to preserve dynamic behavior at the cost of a single
can_buildin edge cases.
…lashinfer-bench into overhead-improvement
This PR reduces the overhead of apply():
create_pkg_namefunction to just use the solution name without hashingget_solutionmethod was originally doing a linear search through all solutions, which is O(n) where n is the total number of solutions across all definitions. So this PR added a_solution_by_nameindex that maps solution names directly to Solution objects.can_build(). So this PR added a solution-to-builder mapping cache in the registry.Minor naming refactor: The internal path for the cutlass dependency has been updated from
_depstothirdpartyin both the CUDA builder configuration and pyproject.toml.Summary by CodeRabbit
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.