mm-ptx (Python)

PTX Inject and Stack PTX with Python bindings.

This package ships two small, header-only C libraries plus Python wrappers:

PTX Inject: find marked sites in PTX and inject your own PTX at those sites.
Stack PTX: generate PTX stubs you can inject at those sites.

PTX Inject: what you write

Mark a site in CUDA with macros:

#include <ptx_inject.h>

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    PTX_INJECT("func",
        PTX_IN (F32, x, x),
        PTX_MOD(F32, y, y),
        PTX_OUT(F32, z, z)
    );
    out[0] = z;
}

Compile the CUDA to PTX (nvcc or cuda.core), then build and inject a stub in Python:

from mm_ptx.ptx_inject import PTXInject

annotated_ptx = "..."  # PTX from nvcc/cuda.core
inject = PTXInject(annotated_ptx)

func = inject["func"]
stub = (
    f"\tadd.ftz.f32 %{func['y'].reg}, %{func['x'].reg}, %{func['y'].reg};\n"
    f"\tadd.ftz.f32 %{func['z'].reg}, %{func['x'].reg}, %{func['y'].reg};"
)

final_ptx = inject.render_ptx({"func": stub})

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    y = x + y;
    z = x + y;
    out[0] = z;
}

Stack PTX: stack-based instruction compiler

If you do not want to hand-write PTX, you can use Stack PTX to generate the stub:

from mm_ptx.stack_ptx import RegisterRegistry
from mm_ptx.stack_ptx_default_types import Stack, PtxInstruction, compiler

# Setup naming associations
registry = RegisterRegistry()
registry.add(func["x"].reg, Stack.f32, name="x")
registry.add(func["y"].reg, Stack.f32, name="y")
registry.add(func["z"].reg, Stack.f32, name="z")
registry.freeze()

# Instructions to run
instructions = [
    registry.x,                     # Push 'x'
    registry.y,                     # Push 'y'
    PtxInstruction.add_ftz_f32,     # Pop 'x', Pop 'y', Push ('x' + 'y')
    registry.x,                     # Push 'x'
    PtxInstruction.add_ftz_f32      # Pop 'x', Pop ('x' + 'y'), Push ('x' + ('x' + 'y')) 
]

# Create ptx stub
ptx_stub = compiler.compile(
    registry=registry,
    instructions=instructions,
    requests=[registry.z],
    ...
)

# Inject the ptx stub in to the ptx inject site/s
final_ptx = inject.render_ptx({"func": ptx_stub})

Printing ptx_stub gives:

    {
    .reg .f32 %_a<2>;
    add.ftz.f32 %_a0, %_x0, %_x2;
    add.ftz.f32 %_a1, %_x2, %_a0;
    mov.f32 %_x1, %_a1;
    }

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    z = x + (x + y);
    out[0] = z;
}

Stack PTX instruction descriptions

The instruction definitions are defined by the user and are not part of the core Stack PTX system. This allows customization of the described instructions to fit the users demands.

Minimal example of PTX instruction and type definitions: examples/stack_ptx_default_types.py
More extensive example: examples/stack_ptx_extended_types.py

Install

pip install mm-ptx

Requires Python 3.9+.

Tests

python -m pip install -e .
python -m unittest discover -s tests

CUDA integration tests are skipped by default. To run them (requires cuda.core, cuda.bindings, and a CUDA-capable GPU):

 MM_PTX_RUN_CUDA_TESTS=1 python -m unittest discover -s tests

Examples

Recipes

LLMs (including ChatGPT) are already very good at generating Python that emits Stack PTX in a specific shape. A concrete example is examples/fun/domain_coloring_random, which programmatically builds 3-5 terms of the form sin(...) or cos(...) over nested add/mul expressions of x, y, and t, mixes in random constants, and sums the terms to produce two outputs f and g. Those expressions are converted into Stack PTX instructions, injected into a CUDA kernel, and used to render animated domain coloring frames on the GPU.

Roadmap

See the mm-ptx Roadmap for the C/C++ core: https://github.com/MetaMachines/mm-ptx#roadmap

Bulk parallel compilation (Python)
- OpenMP-backed interfaces for compiling large batches of Stack PTX stubs and injected kernels directly from Python.
In-The-Loop-Learning (ITLL) system (Python Wrapper)
- Closes the optimization loop: when Stack PTX compiles in microseconds and kernels execute in microseconds-to-milliseconds, ML models must operate on the same time horizon to enable real-time kernel optimization.
- Training and inference on the order of microseconds per batch means the model keeps pace with kernel generation and execution.
- Use cases: learned fitness predictors for evolutionary search, behavioral descriptors for MAP-Elites, online feature extraction from kernel executions.
- Built for small data (500-10,000 rows) with batch sizes in the tens of thousands.

Community & Support

We're here to help with your projects and answer questions:

Discord: Join our community at https://discord.gg/7vS5XQ4bE4 for direct support, discussions, and collaboration.
Twitter/X: Follow @_metamachines for updates and announcements.
Email: Reach us at [email protected]

More details

For the C/C++ headers and deeper implementation notes, see the mm-ptx repo:

License

MIT. See LICENSE.

Citation

If you use this software in your work, please cite it using the following BibTeX entry (generated from CITATION.cff):

@software{Durham_mm-ptx_2025,
  author       = {Durham, Charlie},
  title        = {mm-ptx: PTX Inject and Stack PTX for Python},
  version      = {1.0.1},
  date-released = {2025-10-19},
  url          = {https://github.com/MetaMachines/mm-ptx-py}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mm-ptx (Python)

PTX Inject: what you write

Stack PTX: stack-based instruction compiler

Stack PTX instruction descriptions

Install

Tests

Examples

Recipes

Roadmap

Community & Support

More details

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

MetaMachines/mm-ptx-py

Folders and files

Latest commit

History

Repository files navigation

mm-ptx (Python)

PTX Inject: what you write

Stack PTX: stack-based instruction compiler

Stack PTX instruction descriptions

Install

Tests

Examples

Recipes

Roadmap

Community & Support

More details

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages