Skip to content

Conversation

@zanderjiang
Copy link
Collaborator

@zanderjiang zanderjiang commented Oct 24, 2025

This PR updates the kernel generator, gives user more freedom to generate their own solutions to run on FlashInfer-Bench.

It introduces beam search generation strategy, allowing agents to explore more optimization strategies in parallel.

To reproduce flashinfer-trace solutions, use standard multi-turn generation strategy with 10 rounds.

Summary by CodeRabbit

  • New Features

    • Choose to generate a single definition or process all definitions, with clear validation and status messages
    • New generation modes: iterative rounds and optional beam search with configurable beam width
    • Added guidance for PyTorch/Torch bindings generation separate from other backend prompts
  • Chores

    • Example configuration path updated to a generic placeholder

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds optional per-definition selection to the example script, refactors KernelGenerator to async flows with sequential and beam-search generation, evaluation and selection helpers, and extracts Torch-binding guidance into a new prompt constant while removing binding-specific instructions from CUDA/C++ prompts.

Changes

Cohort / File(s) Summary
Example: definition selection
examples/kernel_generator/kernel_generator_example.py
Adds optional single-definition selection and validation; replaces rmsnorm-only filtering with using all traceset definitions; prints scoped counts/messages; uses a placeholder traceset path; early-exits on invalid selection; adjusts generator call param name.
Kernel generator core (generation flow & async)
examples/kernel_generator/kernel_generator.py
Changes KernelGenerator.generate signature to accept gen_rounds, beam, and beam_width; replaces synchronous flow with async implementations (_sequential_generate_async, _beam_search_generate_async), adds beam-search and sequential strategies, evaluation helpers (_evaluate_solutions, _get_best_trace, _select_best_solution), async code-gen (_generate_code_from_prompt), and candidate-indexed solution creation.
Prompts: Torch bindings extracted
examples/kernel_generator/kernel_generator_prompts.py
Extracts PyTorch/C++ binding guidance into new top-level TORCH_BINDINGS_PROMPT; removes binding-specific instructions and header/includes from CUDA/C++ prompts and places Torch-binding guidance in the new constant.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User
    participant Example as ExampleScript
    participant Traceset as TraceSet
    participant Generator as KernelGenerator

    User->>Example: run (optional definition arg)
    Example->>Traceset: load definitions
    alt specific definition provided
        Example->>Traceset: lookup definition
        alt found
            Example-->>User: print selected definition
            Example->>Generator: generate(selected_definition, ...)
        else not found
            Example-->>User: print error and exit
        end
    else no specific definition
        Example-->>User: print total definitions
        Example->>Generator: generate(each_definition or loop)
    end
Loading
sequenceDiagram
    autonumber
    participant Generator
    participant OpenAI as AsyncOpenAI
    participant Evaluator as Benchmarker
    participant TraceDB as TraceStore

    Generator->>Generator: generate(gen_rounds, beam?, beam_width?)
    alt sequential (beam == false)
        loop 1..gen_rounds
            Generator->>OpenAI: async request code (_generate_code_from_prompt)
            OpenAI-->>Generator: code candidate
            Generator->>Evaluator: _evaluate_solutions(candidate)
            Evaluator->>TraceDB: run traces/benchmarks
            TraceDB-->>Evaluator: results
            Evaluator-->>Generator: trace/metrics
            Generator->>Generator: record/update best
        end
        Generator->>Generator: _select_best_solution -> return
    else beam search (beam == true)
        Generator->>Generator: initialize beam candidates
        loop depth 1..gen_rounds
            Generator->>OpenAI: async expand candidates
            OpenAI-->>Generator: candidate set
            Generator->>Evaluator: evaluate candidates
            Evaluator->>TraceDB: run traces/benchmarks
            TraceDB-->>Evaluator: results
            Evaluator-->>Generator: scored candidates
            Generator->>Generator: prune to beam_width
        end
        Generator->>Generator: select best passing or fallback -> return
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect examples/kernel_generator/kernel_generator.py for correct async/await usage, concurrency/ordering, and beam pruning logic.
  • Verify _evaluate_solutions creates and cleans up temporary trace sets and correctly maps traces back to candidates.
  • Confirm signature change (gen_rounds, beam, beam_width) is handled where the method is invoked.
  • Check kernel_generator_prompts.py that Torch guidance was correctly relocated without removing required instructions for other targets.
  • Validate example script argument handling and early-exit messaging.

Poem

🐇 I hopped through traces, one by one,
Chose a name, then raced the sun.
Rounds or beams, I chased the best,
Benchmarks beat — I put to rest.
A carrot prize for code well-done!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'update kernel generator' is overly vague and generic. While it references the main component being modified, it does not communicate the primary changes (beam search strategy, enhanced generation flexibility) or the specific value added to users. Consider a more specific title such as 'Add beam search generation strategy to kernel generator' or 'Enhance kernel generator with beam search and flexible optimization' to better convey the main improvements.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alex/evaluator-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zanderjiang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the kernel_generator_example.py script to improve its usability and flexibility. The primary goal is to empower users with greater control over generating custom solutions for FlashInfer-Bench by allowing them to target specific kernel definitions. This is achieved through clearer configuration options and the introduction of a new parameter for definition selection, making the example more adaptable and easier to understand for various use cases.

Highlights

  • Enhanced Configuration Clarity: Updated comments for model_name, language, target_gpu, and introduced definition to provide clearer guidance on configuring the kernel generator example.
  • Flexible Definition Selection: Added a new definition variable, allowing users to specify a single kernel definition for solution generation, offering more control than processing all definitions or a hardcoded subset.
  • Generic Traceset Path: Changed the traceset_path to a generic placeholder, requiring users to explicitly set their local path, improving example adaptability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the kernel generator example by making it more flexible and clearer for users. It introduces an option to generate a solution for a specific definition and replaces a hardcoded user path with a placeholder. My feedback includes a suggestion to improve user-friendliness by adding a check for a valid traceset path, and a minor style fix to align with PEP 8 guidelines for inline comments.

Comment on lines 24 to 26
language = "triton" # Target solution language
target_gpu = "B200" # Choose solution target GPU
definition = "" # Leave empty to generate solutions for all definitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and adherence to Python's PEP 8 style guide, inline comments should be separated by at least two spaces from the statement. I've also aligned the comments for better visual structure.1

Suggested change
language = "triton" # Target solution language
target_gpu = "B200" # Choose solution target GPU
definition = "" # Leave empty to generate solutions for all definitions
language = "triton" # Target solution language
target_gpu = "B200" # Choose solution target GPU
definition = "" # Leave empty to generate solutions for all definitions

Style Guide References

Footnotes

  1. PEP 8 E261 states that inline comments should be separated by at least two spaces from the statement.


# TODO: adjust local path to traceset
traceset_path = "/home/akj2/flashinfer-trace"
traceset_path = "/path/to/flashinfer-trace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While using a placeholder path is an improvement, the script doesn't handle the case where the user forgets to change it. If run with the default path, it will create an empty directory, find no definitions, and do nothing without a clear explanation. This can be confusing for new users. It would be more robust to check if any definitions were loaded from the provided path and exit with a helpful error message if not.

For example, you could add this check after loading the traceset:

if not traceset.definitions:
    print(f"Error: No definitions found in traceset at '{traceset_path}'.")
    print("Please ensure `traceset_path` points to a valid flashinfer-trace directory.")
    return

@zanderjiang zanderjiang changed the title update kernel generator example to be more clear update kernel generator Dec 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/kernel_generator/kernel_generator_example.py (1)

18-27: Clarify docstring and avoid reusing definition for different types

The main() docstring and variable naming are slightly out of sync with the new behavior:

  • Docstring still says “all definitions” even though you now support targeting a single definition.
  • definition starts as a string filter and is later reused as a Definition object inside the loop, which is a bit confusing.

You could make this clearer and avoid shadowing with something like:

-    """
-    Generate optimized solutions for all definitions in the traceset.
-    """
-    # TODO: select model, language, target gpu, definition
+    """
+    Generate optimized solutions for one or all definitions in the traceset.
+    """
+    # TODO: select model, language, target GPU, and an optional target definition
     model_name = "gpt-5-2025-08-07"  # Choose author-model
     language = "triton"  # Target solution language
     target_gpu = "B200"  # Choose solution target GPU
-    definition = ""  # Leave empty to generate solutions for all definitions
+    target_definition_name = ""  # Leave empty to generate solutions for all definitions

And later:

-    all_definitions = list(traceset.definitions.keys())
-
-    if definition:
-        if definition in all_definitions:
-            all_definitions = [definition]
-            print(f"Generating solution {definition}")
+    all_definitions = list(traceset.definitions.keys())
+
+    if target_definition_name:
+        if target_definition_name in all_definitions:
+            all_definitions = [target_definition_name]
+            print(f"Generating solution {target_definition_name}")
         else:
-            print(f"Definition '{definition}' not found in traceset")
+            print(f"Definition '{target_definition_name}' not found in traceset")
             return

This keeps the configuration string separate from the per-iteration Definition objects and makes the single-definition behavior more obvious to users.

examples/kernel_generator/kernel_generator.py (3)

74-81: Consider exposing async generation instead of calling asyncio.run inside library code

Both generate() and _beam_search_generate() call asyncio.run(...). This is fine for CLI-style usage, but it will raise RuntimeError if KernelGenerator.generate() is ever called from within an existing event loop (e.g., Jupyter, asyncio-based services).

If you expect library usage in async contexts, consider:

  • Adding an explicit async entrypoint (e.g., async_generate(...)) that directly awaits _sequential_generate_async / _beam_search_generate_async, and
  • Keeping generate() as a thin synchronous wrapper that only uses asyncio.run when no event loop is running.

No immediate functional bug for the current example usage, but this will make the API more robust in async environments.

Also applies to: 101-113, 168-181


215-243: Align beam-search zips with Ruff B905 by making zip strict

In the beam-search path, Ruff B905 flags the two zip() usages without an explicit strict parameter. Here, you do expect the three sequences to stay in lockstep (initial_candidates, solutions, traces and later code_results, solutions, traces), so making this explicit is reasonable and will surface any mismatch as a clear error.

You can update both zips like this:

-        for i, (candidate, solution, trace) in enumerate(
-            zip(initial_candidates, solutions, traces)
-        ):
+        for i, (candidate, solution, trace) in enumerate(
+            zip(initial_candidates, solutions, traces, strict=True)
+        ):
@@
-            for beam_idx, (code_result, solution, trace) in enumerate(
-                zip(code_results, solutions, traces)
-            ):
+            for beam_idx, (code_result, solution, trace) in enumerate(
+                zip(code_results, solutions, traces, strict=True)
+            ):

This keeps behavior the same when lengths match and will fail fast if the evaluation pipeline ever returns an unexpected number of traces.

Also applies to: 277-307


317-367: Evaluation helper and selection logic are solid; minor f-string cleanup

The new _evaluate_solutions, _get_best_trace, and _select_best_solution helpers are cohesive and correctly:

  • Build a temporary TraceSet rooted at traceset.root for benchmarking a specific definition.
  • Map traces back to solutions by name.
  • Choose the best passing solution by speedup_factor, or fall back to the last generated solution, or raise if nothing was produced.

One small cleanup to address Ruff F541 (f-string without placeholders):

-        elif fallback_solution:
-            print(f"\nNo passing solutions found, returning last generated solution")
-            return fallback_solution
+        elif fallback_solution:
+            print("\nNo passing solutions found, returning last generated solution")
+            return fallback_solution

Otherwise, this helper stack looks correct and matches the intended selection behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81fa027 and b84eb6c.

📒 Files selected for processing (2)
  • examples/kernel_generator/kernel_generator.py (6 hunks)
  • examples/kernel_generator/kernel_generator_example.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/kernel_generator/kernel_generator_example.py (1)
flashinfer_bench/data/trace_set.py (2)
  • TraceSet (23-477)
  • from_path (85-145)
🪛 Ruff (0.14.7)
examples/kernel_generator/kernel_generator.py

216-216: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


278-278: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


364-364: f-string without any placeholders

Remove extraneous f prefix

(F541)


367-367: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
🔇 Additional comments (3)
examples/kernel_generator/kernel_generator_example.py (1)

92-99: Generator call and new gen_rounds / beam parameters look correctly wired

The updated KernelGenerator.generate invocation correctly:

  • Uses gen_rounds=10 to match the new API.
  • Leaves beam search options commented out but clearly documented for users to toggle (beam, beam_width).

No functional issues here from the example’s perspective.

examples/kernel_generator/kernel_generator.py (2)

115-167: Sequential multi-round generation and selection flow looks sound

The new _sequential_generate_async implementation:

  • Reuses get_prompt for the initial round and get_optimization_prompt based on the best available trace (or falls back to a fresh prompt).
  • Tracks passing_solutions along with last_solution / last_trace.
  • Delegates evaluation to _evaluate_solutions and final choice to _select_best_solution.

Logic around gen_rounds, evaluation status checks, and fallbacks is consistent; no correctness issues stand out here.


430-447: Async code generation and solution naming scheme look consistent with new flows

The async _generate_code_from_prompt helper and the updated _create_solution_from_code:

  • Correctly branch between responses.create for reasoning models and chat.completions.create for others.
  • Return both raw and cleaned code, which the sequential and beam paths reuse for optimization prompts.
  • Use (round_num, candidate_idx) to produce unique solution names and descriptions that encode both round and beam-candidate index.

This fits well with the new multi-round and beam-search flows; no issues from the bench side.

Also applies to: 452-462

Comment on lines +34 to 45
all_definitions = list(traceset.definitions.keys())

print(f"All definitions found: {len(all_definitions)}")
if definition:
if definition in all_definitions:
all_definitions = [definition]
print(f"Generating solution {definition}")
else:
print(f"Definition '{definition}' not found in traceset")
return

print(f"Found {len(all_definitions)} definitions to generate solutions")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against empty tracesets to avoid confusing behavior and division by zero

If traceset.definitions is empty (e.g., user forgets to update traceset_path), all_definitions is empty and:

  • The loop never runs.
  • total_definitions is 0.
  • The final line success_rate = (successful_generations/total_definitions)*100 will raise ZeroDivisionError.

It also silently “does nothing” before crashing, which can be confusing for new users.

You can handle this up front and fix the success-rate calculation:

-    all_definitions = list(traceset.definitions.keys())
-
-    if definition:
-        if definition in all_definitions:
-            all_definitions = [definition]
-            print(f"Generating solution {definition}")
-        else:
-            print(f"Definition '{definition}' not found in traceset")
-            return
-
-    print(f"Found {len(all_definitions)} definitions to generate solutions")
+    all_definitions = list(traceset.definitions.keys())
+
+    if not all_definitions:
+        print(f"Error: No definitions found in traceset at '{traceset_path}'.")
+        print("Please ensure `traceset_path` points to a valid flashinfer-trace directory.")
+        return
+
+    if target_definition_name:
+        if target_definition_name in all_definitions:
+            all_definitions = [target_definition_name]
+            print(f"Generating solution {target_definition_name}")
+        else:
+            print(f"Definition '{target_definition_name}' not found in traceset")
+            return
+
+    print(f"Found {len(all_definitions)} definitions to generate solutions")
@@
-    print(f"Failed generations: {failed_generations}")
-    print(f"Success rate: {(successful_generations/total_definitions)*100:.1f}%")
+    print(f"Failed generations: {failed_generations}")
+    success_rate = (successful_generations / total_definitions * 100) if total_definitions else 0.0
+    print(f"Success rate: {success_rate:.1f}%")

This both avoids the runtime error and gives a clear message when the dataset path is misconfigured, addressing the earlier concern about the placeholder traceset_path.

Also applies to: 134-141

🤖 Prompt for AI Agents
In examples/kernel_generator/kernel_generator_example.py around lines 34 to 45
(and similarly at 134-141), all_definitions can be empty which leads to silent
no-op behavior and a ZeroDivisionError later when computing success_rate; add an
explicit guard after building all_definitions that checks if it's empty and if
so print a clear error explaining the traceset is empty or misconfigured and
return/exit early, and ensure the later success_rate calculation uses a
protected denominator (or skip calculation when total_definitions == 0) so no
division by zero can occur.

@zanderjiang zanderjiang requested a review from Ubospica December 3, 2025 01:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b84eb6c and 889b88a.

📒 Files selected for processing (1)
  • examples/kernel_generator/kernel_generator_prompts.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.10
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.13
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.11
  • GitHub Check: Run unit tests on ubuntu-latest and Python 3.12
🔇 Additional comments (1)
examples/kernel_generator/kernel_generator_prompts.py (1)

235-248: CUDA optimization prompt addition is fine

The added mention of specialized libraries like CUTLASS fits the surrounding optimization guidance and does not conflict with the rest of the prompt. No changes needed here.

Comment on lines +294 to +319
TORCH_BINDINGS_PROMPT = """
Use TORCH for your generated kernel host function and bindings
Requirements:
- Include all necessary headers (torch/extension.h, kernel.h, etc.)
- Implement the "run" function that:
* Takes torch::Tensor arguments
* Validates tensor properties (device, dtype, shape)
* Extracts raw pointers using .data_ptr<T>()
* Calls the CUDA kernel with appropriate launch configuration
* Returns results as torch::Tensor
- Use PYBIND11_MODULE to bind the "run" function:
* PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{
* m.def("run", &run, "Kernel execution function");
* }}
- Handle both positional args and kwargs properly
- Include proper error messages for invalid inputs
- Use torch::Tensor for all tensor arguments
- Use .device().is_cuda() to check if tensors are on GPU
- Use .dtype() to validate tensor data types
- Use .sizes() or .size(dim) to get tensor dimensions
- Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers
- Call cudaDeviceSynchronize() or cudaGetLastError() for error checking
- Return torch::Tensor from the run function
- Handle exceptions gracefully with proper error messages"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix PYBIND11 example braces in TORCH_BINDINGS_PROMPT

The PYBIND11 example currently shows double braces:

  • PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{
  • }}

This is not valid C++ syntax and may confuse the model/generator. Since this string is not passed through .format, the double braces are unnecessary.

Recommend switching to single braces:

 TORCH_BINDINGS_PROMPT = """
@@
 - Use PYBIND11_MODULE to bind the "run" function:
-  * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{
+  * PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
   *   m.def("run", &run, "Kernel execution function");
-  * }}
+  * }
@@
 - Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers
 - Call cudaDeviceSynchronize() or cudaGetLastError() for error checking
 - Return torch::Tensor from the run function
 - Handle exceptions gracefully with proper error messages"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TORCH_BINDINGS_PROMPT = """
Use TORCH for your generated kernel host function and bindings
Requirements:
- Include all necessary headers (torch/extension.h, kernel.h, etc.)
- Implement the "run" function that:
* Takes torch::Tensor arguments
* Validates tensor properties (device, dtype, shape)
* Extracts raw pointers using .data_ptr<T>()
* Calls the CUDA kernel with appropriate launch configuration
* Returns results as torch::Tensor
- Use PYBIND11_MODULE to bind the "run" function:
* PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{
* m.def("run", &run, "Kernel execution function");
* }}
- Handle both positional args and kwargs properly
- Include proper error messages for invalid inputs
- Use torch::Tensor for all tensor arguments
- Use .device().is_cuda() to check if tensors are on GPU
- Use .dtype() to validate tensor data types
- Use .sizes() or .size(dim) to get tensor dimensions
- Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers
- Call cudaDeviceSynchronize() or cudaGetLastError() for error checking
- Return torch::Tensor from the run function
- Handle exceptions gracefully with proper error messages"""
TORCH_BINDINGS_PROMPT = """
Use TORCH for your generated kernel host function and bindings
Requirements:
- Include all necessary headers (torch/extension.h, kernel.h, etc.)
- Implement the "run" function that:
* Takes torch::Tensor arguments
* Validates tensor properties (device, dtype, shape)
* Extracts raw pointers using .data_ptr<T>()
* Calls the CUDA kernel with appropriate launch configuration
* Returns results as torch::Tensor
- Use PYBIND11_MODULE to bind the "run" function:
* PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
* m.def("run", &run, "Kernel execution function");
* }
- Handle both positional args and kwargs properly
- Include proper error messages for invalid inputs
- Use torch::Tensor for all tensor arguments
- Use .device().is_cuda() to check if tensors are on GPU
- Use .dtype() to validate tensor data types
- Use .sizes() or .size(dim) to get tensor dimensions
- Use .data_ptr<float>() or .data_ptr<T>() to get raw pointers
- Call cudaDeviceSynchronize() or cudaGetLastError() for error checking
- Return torch::Tensor from the run function
- Handle exceptions gracefully with proper error messages"""
🤖 Prompt for AI Agents
In examples/kernel_generator/kernel_generator_prompts.py around lines 294 to
319, the PYBIND11 example in TORCH_BINDINGS_PROMPT mistakenly uses double braces
("{{" and "}}") which are invalid in C++; replace the opening
"PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {{" with a single brace "{" and change
the matching "}}" to "}" so the module block uses normal C++ braces, leaving the
rest of the prompt unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants