[ptx] Support for CUDA JIT compiler flags #713

yrq0208 · 2025-09-16T09:59:09Z

Description

This patch enables the explicit passing of CUDA JIT compiler flags by using the cuModuleLoadDataEx() method and TornadoOptions.java

Problem description

The current Java_uk_ac_manchester_tornado_drivers_ptx_PTXModule_cuModuleLoadData implementation does not allow the explicit passing of CUDA JIT compiler flags.

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

This patch is not applicable to FPGAs

Yes
No

How to test the new patch?

The CUDA JIT flags are passed via the TornadoVM CLI in the form of a string. Please refer to the document for a list of currently supported CUDA JIT flags. Feel free to try other flags. By default, the TestCompilerFlagsAPI unit test for PTX using optimization level 0, by passing opt level 4 via the CLI, I can observe a speedup of around 3.8x in terms of TASK_KERNEL_TIME on my machine, which suggests the passing is successful and the opt level is overwritten from 0 to 4

make BACKEND=ptx 
tornado-test -V --jvm="-Ds0.t0.device=0:0 -Dtornado.ptx.compiler.flags=CU_JIT_OPTIMIZATION_LEVEL\ 4\ CU_JIT_CACHE_MODE\ 0" --enableProfiler console uk.ac.manchester.tornado.unittests.compiler.TestCompilerFlagsAPI#testPTX

Here is another example command with all the flags (set your own CU_JIT_TARGET accordingly! Older versions of CUDA might not support GPU with 12.0 computer capability.):

tornado-test -V --jvm="-Ds0.t0.device=0:0 -Dtornado.ptx.compiler.flags=CU_JIT_OPTIMIZATION_LEVEL\ 4\ CU_JIT_CACHE_MODE\ 0\ CU_JIT_MAX_REGISTERS\ 255\ CU_JIT_TARGET\ 120\ CU_JIT_GENERATE_DEBUG_INFO\ 0\ CU_JIT_LOG_VERBOSE\ 0\ CU_JIT_GENERATE_LINE_INFO\ 0" --enableProfiler console uk.ac.manchester.tornado.unittests.compiler.TestCompilerFlagsAPI#testPTX

… flags can be passed to the CUDA JIT compiler, implement the JNI of cuModuleLoad so TornadoVM prebuilt can read .cubin files, currently both compiler flags and the path of the .cubin are hardcoded, should pass them from API

mikepapadim · 2025-10-14T05:06:54Z

@yrq0208 what it needs to accept compiler flags as in opencl backend?

…ornadoVM into PTX_cuModuleLoadDataEx

…N_LEVEL 4

…to process the CUDA JIT flags passed from TornadoOption.java to the relevant CUDA function

Copilot

Pull Request Overview

This PR implements support for passing CUDA JIT compiler flags through the cuModuleLoadDataEx() method, enabling explicit control over PTX compilation optimization levels and other performance-related flags. Previously, the implementation only used cuModuleLoadData() which didn't allow passing compiler flags.

Key Changes:

Added new JNI method cuModuleLoadDataEx to handle CUDA JIT compiler flags
Updated default PTX compiler flags to include CU_JIT_OPTIMIZATION_LEVEL 4 for optimal performance
Modified PTX module loading pipeline to pass TaskDataContext metadata through the compilation chain

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tornado-runtime/src/main/java/uk/ac/manchester/tornado/runtime/common/TornadoOptions.java	Updated default PTX compiler flags and added documentation on flag format
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/tests/TestPTXTornadoCompiler.java	Updated `installSource` call to pass metadata parameter
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/tests/TestPTXJITCompiler.java	Updated `installCode` call to pass task metadata
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/runtime/PTXTornadoDevice.java	Modified compilation methods to pass task metadata for compiler flags
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/PTXModule.java	Added new constructor parameter for compiler flags and native method declaration
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/PTXDeviceContext.java	Updated `installCode` methods to accept and forward task metadata
tornado-drivers/ptx/src/main/java/uk/ac/manchester/tornado/drivers/ptx/PTXCodeCache.java	Modified to extract compiler flags from task metadata and pass to PTXModule
tornado-drivers/ptx-jni/src/main/cpp/source/PTXModule.h	Added JNI header declaration for `cuModuleLoadDataEx`
tornado-drivers/ptx-jni/src/main/cpp/source/PTXModule.cpp	Implemented `cuModuleLoadDataEx` with CUDA JIT flag parsing and application

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tornado-runtime/src/main/java/uk/ac/manchester/tornado/runtime/common/TornadoOptions.java