-
Notifications
You must be signed in to change notification settings - Fork 13
update kernel generator #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds optional per-definition selection to the example script, refactors KernelGenerator to async flows with sequential and beam-search generation, evaluation and selection helpers, and extracts Torch-binding guidance into a new prompt constant while removing binding-specific instructions from CUDA/C++ prompts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Example as ExampleScript
participant Traceset as TraceSet
participant Generator as KernelGenerator
User->>Example: run (optional definition arg)
Example->>Traceset: load definitions
alt specific definition provided
Example->>Traceset: lookup definition
alt found
Example-->>User: print selected definition
Example->>Generator: generate(selected_definition, ...)
else not found
Example-->>User: print error and exit
end
else no specific definition
Example-->>User: print total definitions
Example->>Generator: generate(each_definition or loop)
end
sequenceDiagram
autonumber
participant Generator
participant OpenAI as AsyncOpenAI
participant Evaluator as Benchmarker
participant TraceDB as TraceStore
Generator->>Generator: generate(gen_rounds, beam?, beam_width?)
alt sequential (beam == false)
loop 1..gen_rounds
Generator->>OpenAI: async request code (_generate_code_from_prompt)
OpenAI-->>Generator: code candidate
Generator->>Evaluator: _evaluate_solutions(candidate)
Evaluator->>TraceDB: run traces/benchmarks
TraceDB-->>Evaluator: results
Evaluator-->>Generator: trace/metrics
Generator->>Generator: record/update best
end
Generator->>Generator: _select_best_solution -> return
else beam search (beam == true)
Generator->>Generator: initialize beam candidates
loop depth 1..gen_rounds
Generator->>OpenAI: async expand candidates
OpenAI-->>Generator: candidate set
Generator->>Evaluator: evaluate candidates
Evaluator->>TraceDB: run traces/benchmarks
TraceDB-->>Evaluator: results
Evaluator-->>Generator: scored candidates
Generator->>Generator: prune to beam_width
end
Generator->>Generator: select best passing or fallback -> return
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @zanderjiang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the kernel generator example by making it more flexible and clearer for users. It introduces an option to generate a solution for a specific definition and replaces a hardcoded user path with a placeholder. My feedback includes a suggestion to improve user-friendliness by adding a check for a valid traceset path, and a minor style fix to align with PEP 8 guidelines for inline comments.
| language = "triton" # Target solution language | ||
| target_gpu = "B200" # Choose solution target GPU | ||
| definition = "" # Leave empty to generate solutions for all definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability and adherence to Python's PEP 8 style guide, inline comments should be separated by at least two spaces from the statement. I've also aligned the comments for better visual structure.1
| language = "triton" # Target solution language | |
| target_gpu = "B200" # Choose solution target GPU | |
| definition = "" # Leave empty to generate solutions for all definitions | |
| language = "triton" # Target solution language | |
| target_gpu = "B200" # Choose solution target GPU | |
| definition = "" # Leave empty to generate solutions for all definitions |
Style Guide References
Footnotes
-
PEP 8 E261 states that inline comments should be separated by at least two spaces from the statement. ↩
|
|
||
| # TODO: adjust local path to traceset | ||
| traceset_path = "/home/akj2/flashinfer-trace" | ||
| traceset_path = "/path/to/flashinfer-trace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While using a placeholder path is an improvement, the script doesn't handle the case where the user forgets to change it. If run with the default path, it will create an empty directory, find no definitions, and do nothing without a clear explanation. This can be confusing for new users. It would be more robust to check if any definitions were loaded from the provided path and exit with a helpful error message if not.
For example, you could add this check after loading the traceset:
if not traceset.definitions:
print(f"Error: No definitions found in traceset at '{traceset_path}'.")
print("Please ensure `traceset_path` points to a valid flashinfer-trace directory.")
returnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
examples/kernel_generator/kernel_generator_example.py (1)
18-27: Clarify docstring and avoid reusingdefinitionfor different typesThe
main()docstring and variable naming are slightly out of sync with the new behavior:
- Docstring still says “all definitions” even though you now support targeting a single definition.
definitionstarts as a string filter and is later reused as aDefinitionobject inside the loop, which is a bit confusing.You could make this clearer and avoid shadowing with something like:
- """ - Generate optimized solutions for all definitions in the traceset. - """ - # TODO: select model, language, target gpu, definition + """ + Generate optimized solutions for one or all definitions in the traceset. + """ + # TODO: select model, language, target GPU, and an optional target definition model_name = "gpt-5-2025-08-07" # Choose author-model language = "triton" # Target solution language target_gpu = "B200" # Choose solution target GPU - definition = "" # Leave empty to generate solutions for all definitions + target_definition_name = "" # Leave empty to generate solutions for all definitionsAnd later:
- all_definitions = list(traceset.definitions.keys()) - - if definition: - if definition in all_definitions: - all_definitions = [definition] - print(f"Generating solution {definition}") + all_definitions = list(traceset.definitions.keys()) + + if target_definition_name: + if target_definition_name in all_definitions: + all_definitions = [target_definition_name] + print(f"Generating solution {target_definition_name}") else: - print(f"Definition '{definition}' not found in traceset") + print(f"Definition '{target_definition_name}' not found in traceset") returnThis keeps the configuration string separate from the per-iteration
Definitionobjects and makes the single-definition behavior more obvious to users.examples/kernel_generator/kernel_generator.py (3)
74-81: Consider exposing async generation instead of callingasyncio.runinside library codeBoth
generate()and_beam_search_generate()callasyncio.run(...). This is fine for CLI-style usage, but it will raiseRuntimeErrorifKernelGenerator.generate()is ever called from within an existing event loop (e.g., Jupyter, asyncio-based services).If you expect library usage in async contexts, consider:
- Adding an explicit async entrypoint (e.g.,
async_generate(...)) that directly awaits_sequential_generate_async/_beam_search_generate_async, and- Keeping
generate()as a thin synchronous wrapper that only usesasyncio.runwhen no event loop is running.No immediate functional bug for the current example usage, but this will make the API more robust in async environments.
Also applies to: 101-113, 168-181
215-243: Align beam-search zips with Ruff B905 by makingzipstrictIn the beam-search path, Ruff B905 flags the two
zip()usages without an explicitstrictparameter. Here, you do expect the three sequences to stay in lockstep (initial_candidates,solutions,tracesand latercode_results,solutions,traces), so making this explicit is reasonable and will surface any mismatch as a clear error.You can update both zips like this:
- for i, (candidate, solution, trace) in enumerate( - zip(initial_candidates, solutions, traces) - ): + for i, (candidate, solution, trace) in enumerate( + zip(initial_candidates, solutions, traces, strict=True) + ): @@ - for beam_idx, (code_result, solution, trace) in enumerate( - zip(code_results, solutions, traces) - ): + for beam_idx, (code_result, solution, trace) in enumerate( + zip(code_results, solutions, traces, strict=True) + ):This keeps behavior the same when lengths match and will fail fast if the evaluation pipeline ever returns an unexpected number of traces.
Also applies to: 277-307
317-367: Evaluation helper and selection logic are solid; minor f-string cleanupThe new
_evaluate_solutions,_get_best_trace, and_select_best_solutionhelpers are cohesive and correctly:
- Build a temporary
TraceSetrooted attraceset.rootfor benchmarking a specific definition.- Map traces back to solutions by name.
- Choose the best passing solution by
speedup_factor, or fall back to the last generated solution, or raise if nothing was produced.One small cleanup to address Ruff F541 (f-string without placeholders):
- elif fallback_solution: - print(f"\nNo passing solutions found, returning last generated solution") - return fallback_solution + elif fallback_solution: + print("\nNo passing solutions found, returning last generated solution") + return fallback_solutionOtherwise, this helper stack looks correct and matches the intended selection behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/kernel_generator/kernel_generator.py(6 hunks)examples/kernel_generator/kernel_generator_example.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/kernel_generator/kernel_generator_example.py (1)
flashinfer_bench/data/trace_set.py (2)
TraceSet(23-477)from_path(85-145)
🪛 Ruff (0.14.7)
examples/kernel_generator/kernel_generator.py
216-216: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
278-278: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
364-364: f-string without any placeholders
Remove extraneous f prefix
(F541)
367-367: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
🔇 Additional comments (3)
examples/kernel_generator/kernel_generator_example.py (1)
92-99: Generator call and newgen_rounds/ beam parameters look correctly wiredThe updated
KernelGenerator.generateinvocation correctly:
- Uses
gen_rounds=10to match the new API.- Leaves beam search options commented out but clearly documented for users to toggle (
beam,beam_width).No functional issues here from the example’s perspective.
examples/kernel_generator/kernel_generator.py (2)
115-167: Sequential multi-round generation and selection flow looks soundThe new
_sequential_generate_asyncimplementation:
- Reuses
get_promptfor the initial round andget_optimization_promptbased on the best available trace (or falls back to a fresh prompt).- Tracks
passing_solutionsalong withlast_solution/last_trace.- Delegates evaluation to
_evaluate_solutionsand final choice to_select_best_solution.Logic around
gen_rounds, evaluation status checks, and fallbacks is consistent; no correctness issues stand out here.
430-447: Async code generation and solution naming scheme look consistent with new flowsThe async
_generate_code_from_prompthelper and the updated_create_solution_from_code:
- Correctly branch between
responses.createfor reasoning models andchat.completions.createfor others.- Return both raw and cleaned code, which the sequential and beam paths reuse for optimization prompts.
- Use
(round_num, candidate_idx)to produce unique solution names and descriptions that encode both round and beam-candidate index.This fits well with the new multi-round and beam-search flows; no issues from the bench side.
Also applies to: 452-462
| all_definitions = list(traceset.definitions.keys()) | ||
|
|
||
| print(f"All definitions found: {len(all_definitions)}") | ||
| if definition: | ||
| if definition in all_definitions: | ||
| all_definitions = [definition] | ||
| print(f"Generating solution {definition}") | ||
| else: | ||
| print(f"Definition '{definition}' not found in traceset") | ||
| return | ||
|
|
||
| print(f"Found {len(all_definitions)} definitions to generate solutions") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against empty tracesets to avoid confusing behavior and division by zero
If traceset.definitions is empty (e.g., user forgets to update traceset_path), all_definitions is empty and:
- The loop never runs.
total_definitionsis 0.- The final line
success_rate = (successful_generations/total_definitions)*100will raiseZeroDivisionError.
It also silently “does nothing” before crashing, which can be confusing for new users.
You can handle this up front and fix the success-rate calculation:
- all_definitions = list(traceset.definitions.keys())
-
- if definition:
- if definition in all_definitions:
- all_definitions = [definition]
- print(f"Generating solution {definition}")
- else:
- print(f"Definition '{definition}' not found in traceset")
- return
-
- print(f"Found {len(all_definitions)} definitions to generate solutions")
+ all_definitions = list(traceset.definitions.keys())
+
+ if not all_definitions:
+ print(f"Error: No definitions found in traceset at '{traceset_path}'.")
+ print("Please ensure `traceset_path` points to a valid flashinfer-trace directory.")
+ return
+
+ if target_definition_name:
+ if target_definition_name in all_definitions:
+ all_definitions = [target_definition_name]
+ print(f"Generating solution {target_definition_name}")
+ else:
+ print(f"Definition '{target_definition_name}' not found in traceset")
+ return
+
+ print(f"Found {len(all_definitions)} definitions to generate solutions")
@@
- print(f"Failed generations: {failed_generations}")
- print(f"Success rate: {(successful_generations/total_definitions)*100:.1f}%")
+ print(f"Failed generations: {failed_generations}")
+ success_rate = (successful_generations / total_definitions * 100) if total_definitions else 0.0
+ print(f"Success rate: {success_rate:.1f}%")This both avoids the runtime error and gives a clear message when the dataset path is misconfigured, addressing the earlier concern about the placeholder traceset_path.
Also applies to: 134-141
🤖 Prompt for AI Agents
In examples/kernel_generator/kernel_generator_example.py around lines 34 to 45
(and similarly at 134-141), all_definitions can be empty which leads to silent
no-op behavior and a ZeroDivisionError later when computing success_rate; add an
explicit guard after building all_definitions that checks if it's empty and if
so print a clear error explaining the traceset is empty or misconfigured and
return/exit early, and ensure the later success_rate calculation uses a
protected denominator (or skip calculation when total_definitions == 0) so no
division by zero can occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/kernel_generator/kernel_generator_prompts.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
- GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
🔇 Additional comments (1)
examples/kernel_generator/kernel_generator_prompts.py (1)
235-248: CUDA optimization prompt addition is fineThe added mention of specialized libraries like CUTLASS fits the surrounding optimization guidance and does not conflict with the rest of the prompt. No changes needed here.
| TORCH_BINDINGS_PROMPT = """ | ||
| Use TORCH for your generated kernel host function and bindings | ||
| Requirements: | ||
| - Include all necessary headers (torch/extension.h, kernel.h, etc.) | ||
| - Implement the "run" function that: | ||
| * Takes torch::Tensor arguments | ||
| * Validates tensor properties (device, dtype, shape) | ||
| * Extracts raw pointers using .data_ptr<T>() | ||
| * Calls the CUDA kernel with appropriate launch configuration | ||
| * Returns results as torch::Tensor | ||
| - Use PYBIND11_MODULE to bind the "run" function: | ||
| * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{ | ||
| * m.def("run", &run, "Kernel execution function"); | ||
| * }} | ||
| - Handle both positional args and kwargs properly | ||
| - Include proper error messages for invalid inputs | ||
| - Use torch::Tensor for all tensor arguments | ||
| - Use .device().is_cuda() to check if tensors are on GPU | ||
| - Use .dtype() to validate tensor data types | ||
| - Use .sizes() or .size(dim) to get tensor dimensions | ||
| - Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers | ||
| - Call cudaDeviceSynchronize() or cudaGetLastError() for error checking | ||
| - Return torch::Tensor from the run function | ||
| - Handle exceptions gracefully with proper error messages""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix PYBIND11 example braces in TORCH_BINDINGS_PROMPT
The PYBIND11 example currently shows double braces:
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{}}
This is not valid C++ syntax and may confuse the model/generator. Since this string is not passed through .format, the double braces are unnecessary.
Recommend switching to single braces:
TORCH_BINDINGS_PROMPT = """
@@
- Use PYBIND11_MODULE to bind the "run" function:
- * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{
+ * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
* m.def("run", &run, "Kernel execution function");
- * }}
+ * }
@@
- Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers
- Call cudaDeviceSynchronize() or cudaGetLastError() for error checking
- Return torch::Tensor from the run function
- Handle exceptions gracefully with proper error messages"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TORCH_BINDINGS_PROMPT = """ | |
| Use TORCH for your generated kernel host function and bindings | |
| Requirements: | |
| - Include all necessary headers (torch/extension.h, kernel.h, etc.) | |
| - Implement the "run" function that: | |
| * Takes torch::Tensor arguments | |
| * Validates tensor properties (device, dtype, shape) | |
| * Extracts raw pointers using .data_ptr<T>() | |
| * Calls the CUDA kernel with appropriate launch configuration | |
| * Returns results as torch::Tensor | |
| - Use PYBIND11_MODULE to bind the "run" function: | |
| * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{ | |
| * m.def("run", &run, "Kernel execution function"); | |
| * }} | |
| - Handle both positional args and kwargs properly | |
| - Include proper error messages for invalid inputs | |
| - Use torch::Tensor for all tensor arguments | |
| - Use .device().is_cuda() to check if tensors are on GPU | |
| - Use .dtype() to validate tensor data types | |
| - Use .sizes() or .size(dim) to get tensor dimensions | |
| - Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers | |
| - Call cudaDeviceSynchronize() or cudaGetLastError() for error checking | |
| - Return torch::Tensor from the run function | |
| - Handle exceptions gracefully with proper error messages""" | |
| TORCH_BINDINGS_PROMPT = """ | |
| Use TORCH for your generated kernel host function and bindings | |
| Requirements: | |
| - Include all necessary headers (torch/extension.h, kernel.h, etc.) | |
| - Implement the "run" function that: | |
| * Takes torch::Tensor arguments | |
| * Validates tensor properties (device, dtype, shape) | |
| * Extracts raw pointers using .data_ptr<T>() | |
| * Calls the CUDA kernel with appropriate launch configuration | |
| * Returns results as torch::Tensor | |
| - Use PYBIND11_MODULE to bind the "run" function: | |
| * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { | |
| * m.def("run", &run, "Kernel execution function"); | |
| * } | |
| - Handle both positional args and kwargs properly | |
| - Include proper error messages for invalid inputs | |
| - Use torch::Tensor for all tensor arguments | |
| - Use .device().is_cuda() to check if tensors are on GPU | |
| - Use .dtype() to validate tensor data types | |
| - Use .sizes() or .size(dim) to get tensor dimensions | |
| - Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers | |
| - Call cudaDeviceSynchronize() or cudaGetLastError() for error checking | |
| - Return torch::Tensor from the run function | |
| - Handle exceptions gracefully with proper error messages""" |
🤖 Prompt for AI Agents
In examples/kernel_generator/kernel_generator_prompts.py around lines 294 to
319, the PYBIND11 example in TORCH_BINDINGS_PROMPT mistakenly uses double braces
("{{" and "}}") which are invalid in C++; replace the opening
"PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{" with a single brace "{" and change
the matching "}}" to "}" so the module block uses normal C++ braces, leaving the
rest of the prompt unchanged.
This PR updates the kernel generator, gives user more freedom to generate their own solutions to run on FlashInfer-Bench.
It introduces beam search generation strategy, allowing agents to explore more optimization strategies in parallel.
To reproduce flashinfer-trace solutions, use standard multi-turn generation strategy with 10 rounds.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.