[FEATURE] NKI interpreter #206

latentCall145 · 2025-10-29T21:59:44Z

sorry for big diff :(

Adds the NKI frontend to Triton-Viz:

added a NKI interpreter to apply language-level patching (e.g. allow NKI kernels to be executed line-by-line with NumPy; modifies nki.py)
AST rewriting to convert NKI loads/stores to masked loads/stores (modifies nki_extract_slice.py)
Masked load/store implementation (modifies nki_masked_load.py)
added backend (one of "nki" or "triton") as arguments for patching (modifies patch.py, client.py)
added tracer callbacks for masked loads/stores (modifies data.py, tracer.py)
separate Triton and NKI Trace objects (modifies trace.py)
a lot of visualization stuff (credit: @gujialiang123)

Added Tests

nki-examples/: demo programs to try out the visualizer (currently, only nki-examples/matmul.py is supported)
tests/test_nki.py: basic NDArray slicing
tests/test_masked_load.py: make sure masked load/store works correctly

examples/nki/matmul.py

Jokeren · 2025-10-29T22:32:49Z

scripts/print_tracebacks.py

@@ -0,0 +1,58 @@
+import os


What is this for?

Sorry this script was used for debugging. I`ll remove it

tests/test_masked_load.py

triton_viz/clients/tracer/tracer.py

Jokeren · 2025-10-29T22:37:17Z

triton_viz/clients/tracer/tracer.py

            )

-        def pre_store_callback(ptr, value, mask, cache_modifier, eviction_policy):
+        def pre_store_callback(ptr, value, mask, *ignore_args, **ignore_kwargs):


Why do we need these args and kwargs here?

They're no longer necessary (they were an artifact of when I tried to use that callback to handle masked loads/stores). But in general the *args/**kwargs are in the callbacks because:

Triton-viz callbacks take in all passed arguments to the functions they wrap (i.e. adding a callback to fn(a, ..., z) means that the callback needs to take in args a through z).

Masked loads/stores have a different function signature compared to triton loads/stores

Thus, using the same callback for different patched operations would cause issues (this is important for the post_dot_callback, which both Triton and NKI uses)

OK, I would prefer just name them as args and kwargs

I also think the client manager needs to standardize ops across different backends before passing them to clients

We already do that? In patch.py, OPERATION_REGISTRY maps backends and standardized ops into interpreter methods to patch, such as

OPERATION_REGISTRY['triton']['original_ops'][Dot] # equals interpreter_builder.create_dot OPERATION_REGISTRY['nki']['original_ops'][Dot] # equals nki_builder.matmul

When patching an op like interpreter_builder.create_dot or nki_builder.matmul, they're both converted to a PatchOp(op_type=Dot) object so it's standardized in that way

Do you want the triton-viz callbacks to not allow args/kwargs and the client manager is responsible for supplying the right args to the callbacks? Something conceptually like this:

# some_placeholder_client.py def pre_load_callback(a, b, c): # no args/kwargs, correct arguments selected at client manager level # ... #client.py def triton_patched_load(a, b, c, ...): # tl.load parameters pre_load_callback(a, b, c) tl.load(a, b, c, ...) ... def nki_patched_load(b, a, c, ...): # nl.load parameters, may be different values or in different order from tl.load parameters pre_load_callback(a, b, c) nl.load(b, a, c, ...) ... if backend == 'triton': patched_load = triton_patched_load else: patched_load = nki_patched_load

Do you want the triton-viz callbacks to not allow args/kwargs and the client manager is responsible for supplying the right args to the callbacks?

Yes, the code can exist in client.py, patch.py, or another file but not the concrete client. Each client itself only sees "normalized" callback parameters (e.g., a, b, c) in your case

Jokeren · 2025-10-29T22:39:14Z

triton_viz/clients/tracer/tracer.py

+        def _extract_user_frames() -> list[traceback.FrameSummary]:
+            stack: list[traceback.FrameSummary] = list(traceback.extract_stack())
+            # drop current frames (this function and callers)
+            stack = stack[:-2]


I feel like it's duplicate with def __post_init__(self): function in data.py

Jokeren · 2025-10-29T22:41:36Z

triton_viz/core/data.py



+@dataclass
+class MaskedLoad(Op):


I probably missed something, I think RawStore or RawLoad shouldn't contain a mask while MaskedStore or MaskedLoad should contain a mask. Also why do we have three different load/store ops?

Yeah this was pretty hacky, I wanted to differentiate when to use the triton load/masked load (and store) callbacks when patching:

triton-viz/triton_viz/clients/tracer/tracer.py

Lines 245 to 248 in 1910b78

elif op_type is Load:

return OpCallbacks(before_callback=pre_load_callback)

elif op_type is MaskedLoad:

return OpCallbacks(before_callback=pre_masked_load_callback)

I was thinking of providing the DSL as an input to this function to allow something like this:

elif op_type is Load: if backend == 'triton': return OpCallbacks(before_callback=pre_load_callback) elif backend == 'nki': return OpCallbacks(before_callback=pre_masked_load_callback)

but it felt a bit weird that we would have to deal with backend-specific logic at the triton-viz level rather than at the language-level patching/AST rewriting stage.

Hence I made empty Op classes for MaskedLoad and MaskedStore

Yeah this was pretty hacky, I wanted to differentiate when to use the triton load/masked load (and store) callbacks when patching:

Cannot both ops be modeled as a masked load? When you have a non-masked load/store, the mask field = None?

In reality the Load/MaskedLoad class naming is a misnomer and more accurate names would probably be TritonLoad/NKILoad. Or PointerPlusOffsetLoad/NumpyIndexedLoad if you want to signal that these implementations can be reused for different DSLs.

So yes, Load/MaskedLoad are both technically masked loads but for different backend implementations.

Though, you could model the triton and NKI load in the same callback. But you'd still need to handle the different input arguments provided in Triton vs. NKI. Something like:

def pre_load_callback(array, offsets=None, mask=None): if offsets is None: # only true for triton tensors offsets = array.data - ptr.data_ptr() # rest of logic

The above solution seems like it would be a bit confusing to read though

Though, you could model the triton and NKI load in the same callback. But you'd still need to handle the different input arguments provided in Triton vs. NKI. Something like:

Like I mentioned before, we can handle the differences before delivering the data to the concrete analysis client. In that case, do we still need to separate TritonLoad and NKILoad? Probably not?

if "backend" == "nki": a, b, c = handle_nki_load elif "backend" == "triton": a, b, c = handle_triton_load pre_load_callback(a, b, c)

Seems tricky even with normalized callbacks. NKI and Triton loads have different arguments, so it'd be more like

if 'backend' == 'nki': array, keys, mask = handle_nki_load elif 'backend' == 'triton': array_plus_offsets, mask = handle_triton_load

I haven't been able to find a perfect way to make both backends use the same args.

I can't extract array from array_plus_offsets by itself (the tracer does this by storing tensor pointers collected from arg callbacks but this isn't something we can do in a stateless function)

We could have handle_triton_load return None for the keys arg so the normalized args have the same length . Then we'd need to handle if keys is None in the client callback, but doing this means clients implicitly handle DSL-specific logic which I don't think we want? Though this is probably the easiest option to implement if we're ok with this

What does "keys" mean here?

keys is the actual slice itself, i.e. for x[arange(B), 3] the keys would be (arange(B), 3)

I think it's kind of general. Let's keep it and make it optional[none].

Jokeren · 2025-10-29T22:50:37Z

triton_viz/visualizer/interface.py

+
+            t = op_data["global_tensor"]
+            # Normalize to numpy array for robust indexing
+            if hasattr(t, "cpu"):


I think the data need to be normalized outside this file

mark14wu · 2025-11-04T02:02:21Z

@latentCall145 I just resolved the conflicts. Please pull before committing.

…ck args for all clients)

…ols/triton-viz into nki/patch-dsl

Jokeren · 2025-11-06T13:26:19Z

Hi @mark14wu , can you also review the PR?

mark14wu · 2025-11-07T02:40:31Z

Hi @mark14wu , can you also review the PR?

Yes. I have dropped some comments, but haven't finished reviewing all files.

mark14wu · 2025-11-07T04:59:04Z

Some tests in CI is not working.

Traceback:
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_wrapper.py:8: in <module>
    from triton_viz.wrapper import SANITIZER_COMMAND, PROFILER_COMMAND
triton_viz/__init__.py:1: in <module>
    from .core import trace, clear, config
triton_viz/core/__init__.py:1: in <module>
    from .trace import trace, clear
triton_viz/core/trace.py:7: in <module>
    from ..clients import Sanitizer, Profiler, Tracer
triton_viz/clients/__init__.py:1: in <module>
    from .profiler.profiler import Profiler
triton_viz/clients/profiler/profiler.py:1: in <module>
    from ...core.client import Client
triton_viz/core/client.py:8: in <module>
    from .patch import (
triton_viz/core/patch.py:61: in <module>
    from triton_viz.core.nki import nki_builder
triton_viz/core/nki.py:3: in <module>
    import neuronxcc.nki.language as nl
E   ModuleNotFoundError: No module named 'neuronxcc'

Please fix.

Jokeren and others added 30 commits July 28, 2025 10:09

Support NKI

e6e0641

Update

c9b2501

Update

6fe57e6

Update

86d1d6f

Update

7aa3d91

Update

4f8f616

Update

9962f53

Update

25c8ceb

Update

cb07831

Update

1810284

Update

e6a6f91

Update

8406f66

Update

b63e74c

Update

241530b

Update

d28a17e

Update

bc0873a

UPdate

4b6827d

Update

62ade2d

Update

d9d7bf7

Update

998a1a5

Update

2a50a04

Update

8e49f9d

Update

45596d3

Update

a54ac94

Update

f1231fe

Update comments

b1f658a

stuff

e7e2dd2

triton-viz should support cpu to allow development on non-trainium node

b3ee8d9

test kernels to check basic tracing

454ac73

more flexible advanced indexing (just make arange slices an arange)

65e861d

latentCall145 added 4 commits October 28, 2025 17:07

Merge branch 'main' into nki/merge-interpreter-working

4826219

remove NLSlice (was never used)

e9ecf82

fix offsets/mask shape mismatch bug in visualizing masked stores

b74f471

unify DSL patching pt 1. (make nki matmul + load_store examples work)

1910b78

Jokeren reviewed Oct 29, 2025

View reviewed changes

latentCall145 and others added 6 commits October 31, 2025 13:51

remove some files

a120447

normal args and kwargs

b5797ef

allocate instead of array

4e0f02a

reorder callbacks

a913fda

more examples

0c92fbd

Merge branch 'main' into nki/patch-dsl

0ee0896

gujialiang123 and others added 7 commits November 5, 2025 05:15

apply ruff-format; move normalization to draw; stop tracking scripts/

0fc84ec

Triton/NKI Load/Store unify pt 1. (push to same callback - not working)

aceeae2

Triton/NKI Load/Store unify pt. 2 (add adapters to standardize callba…

10271ae

…ck args for all clients)

whoops forgot to add allocate adapter

9f37b24

be gone

cd623f5

add adapter tests

6470e5b

Merge branch 'nki/patch-dsl' of github.com:Deep-Learning-Profiling-To…

187c347

…ols/triton-viz into nki/patch-dsl

latentCall145 and others added 8 commits November 6, 2025 14:47

revert unwanted changes from main

71b8e71

patch backend less bad

cc9a787

remove unhelpful comments + show expected transformed code

f94d1be

debloat nki offsets code

675abab

fix matmul visualization error

7c3bec2

Merge branch 'main' into nki/patch-dsl

86e476a

lint

229d4a9

Merge branch 'main' into nki/patch-dsl

096122c

	elif op_type is Load:
	return OpCallbacks(before_callback=pre_load_callback)
	elif op_type is MaskedLoad:
	return OpCallbacks(before_callback=pre_masked_load_callback)

[FEATURE] NKI interpreter #206

Are you sure you want to change the base?

[FEATURE] NKI interpreter #206

Uh oh!

Conversation

latentCall145 commented Oct 29, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

latentCall145 Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jokeren Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jokeren Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

latentCall145 Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

latentCall145 Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mark14wu commented Nov 4, 2025

Uh oh!

Jokeren commented Nov 6, 2025

Uh oh!

mark14wu commented Nov 7, 2025

Uh oh!

mark14wu commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

latentCall145 Oct 30, 2025 •

edited

Loading

Jokeren Oct 30, 2025 •

edited

Loading

Jokeren Nov 2, 2025 •

edited

Loading

latentCall145 Oct 31, 2025 •

edited

Loading

latentCall145 Nov 3, 2025 •

edited

Loading