-
Notifications
You must be signed in to change notification settings - Fork 212
refactor: improve Nx.Defn.Evaluator debugging usability #1644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…omparison between nodes (see save_as documentation). Created a backend_comparison.livemd file to document comparison between backends for the evaluator. Added the backend_comparison and complex_fft to the mix.exs file
|
|
||
| ## Simulating Backend Differences with Mimic | ||
|
|
||
| Instead of shipping a dedicated mock backend, we can use `Mimic.stub/3` to override individual callbacks on `Nx.BinaryBackend`. First we initialize `Mimic` and copy the binary backend so it can be stubbed safely. Then `add`, `multiply`, and `divide` are swapped to make the divergence easy to spot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Instead of shipping a dedicated mock backend, we can use `Mimic.stub/3` to override individual callbacks on `Nx.BinaryBackend`. First we initialize `Mimic` and copy the binary backend so it can be stubbed safely. Then `add`, `multiply`, and `divide` are swapped to make the divergence easy to spot | |
| For demonstration purposes, instead of defining an new incorrect backend, we can use `Mimic.stub/3` to override individual callbacks on `Nx.BinaryBackend`. We use `Mimic.copy(Nx.BinaryBackend)` so it can be stubbed correctly. Then `add`, `multiply`, and `divide` are swapped to force a divergence in implementation. |
|
|
||
| In order to ensure the same `id` for each node in the graph while our function traverses it on both backends, we need to use `Nx.Defn.debug_expr/1` to pre-compile `SimpleComputation.compute/2`. | ||
|
|
||
| This is a trick to make sure the same expression is passed on both `Nx.Defn.jit/2` calls and should not be used liberally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This is a trick to make sure the same expression is passed on both `Nx.Defn.jit/2` calls and should not be used liberally. | |
| This is a trick to make sure the same expression is passed on both `Nx.Defn.jit/2` calls and should not be used liberally elsewhere. |
| The result line is actual Elixir code that can be executed: | ||
|
|
||
| ```elixir | ||
| # Extract and execute the result line | ||
| result_line = content | ||
| |> String.split("\n") | ||
| |> Enum.find(&String.starts_with?(&1, "result = ")) | ||
| |> String.replace("result = ", "") | ||
|
|
||
| # Execute it to reconstruct the tensor | ||
| {reconstructed, _} = Code.eval_string(result_line) | ||
|
|
||
| IO.puts("✅ Successfully reconstructed tensor from .exs file!") | ||
| IO.inspect(reconstructed, label: "Reconstructed tensor") | ||
| IO.puts("\nProperties:") | ||
| IO.puts(" Type: #{inspect(reconstructed.type)}") | ||
| IO.puts(" Shape: #{inspect(reconstructed.shape)}") | ||
| IO.puts(" Backend: #{inspect(reconstructed.data.__struct__)}") | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just use Code.eval_file on th example file to show that it is a valid elixir script by itself.
| * **Arguments** - List containing parameters and tensors as strings | ||
| * **Result** - Executable code that reconstructs the output tensor from binary | ||
|
|
||
| ## Verifying Executability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a separate section, I think this could just be a ### header on the Examining the Output Files section
|
|
||
| ## Running with Backend B | ||
|
|
||
| Now let's run the same computation with the swapped operations. We leave `Nx` on its default backend, but temporarily enable the Mimic stubs so the evaluator will capture the modified behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Now let's run the same computation with the swapped operations. We leave `Nx` on its default backend, but temporarily enable the Mimic stubs so the evaluator will capture the modified behaviour. | |
| Now let's run the same computation with the swapped operations. We leave `Nx` on its default backend, but temporarily enable the Mimic stubs so the evaluator will capture the modified behaviour. | |
| In practice, this would be where we call a totally different backend to compare against the reference. |
|
|
||
| # Add rename if there are non-nil names | ||
| code = | ||
| if names != [] and Enum.any?(names, &(&1 != nil)) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if names != [] and Enum.any?(names, &(&1 != nil)) do | |
| if Enum.any?(names, fn name -> not is_nil(name) end) do |
| defp serialize_tensor(%Nx.Tensor{data: %Expr{id: id}} = _tensor) when is_reference(id) do | ||
| # This is an unevaluated expression, not a concrete tensor | ||
| # Show the Node ID so users can find which file contains this tensor | ||
| id_str = :erlang.ref_to_list(id) |> List.to_string() |> String.replace(["#Ref<", ">"], "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this line to a defp so we can use it when defining the filename on line 636 (but keep the replace with _ there too)
…tput Files' and the code now just uses
Chapaman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the Verifying Executability section is within Examining the Output Files and the code now just uses Code.eval_file()
Changes
evaluator.exto produce.exsfiles that can be executed for comparison between nodes (see save_as documentation).Creates a
backend_comparison.livemdfile to document comparison between backends for the evaluator.Adds the
backend_comparisonandcomplex_fftto themix.exsfile