Add status attribute to output files for improved error handling and checkpointing #1754
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements an MVP for checkpointing and error file tracking by adding a
statusattribute to output NetCDF files. This ensures that users and automated systems can easily determine whether a simulation completed successfully, is a checkpoint, or encountered an error condition.Addresses: #1679
Problem
Currently, TORAX simulations that encounter errors (NaN detection, negative profiles, quasineutrality violations, or reaching minimum timestep) may not write output files consistently. This makes it difficult to:
Solution
This implementation follows the design feedback from the issue, prioritizing simplicity and maximal reuse of existing infrastructure.
New
SimStatusEnum (_src/state.py):COMPLETED: Simulation reachedt_finalsuccessfully.CHECKPOINT: Intermediate checkpoint (reserved for V2/future use).ERROR: Simulation stopped due to an error condition.Status Attribute in Output (
_src/output_tools/output.py):statusattribute.sim_errorstate.data_tree.attrs['status'].Consistent Error File Writing (
_src/simulation_app.py):NAN_DETECTEDNEGATIVE_CORE_PROFILESQUASINEUTRALITY_BROKENREACHED_MIN_DTDesign Principles Followed
output_tools.Changes
torax/_src/state.py: AddedSimStatusStrEnum.torax/_src/output_tools/output.py: Added logic to inject the status attribute into the DataTree.torax/_src/simulation_app.py: Improved error logging and ensured file writing triggers on error states.torax/_src/output_tools/tests/output_test.py: Added comprehensive tests for the status attribute.Testing
statusattribute is correctly set toCOMPLETEDfor successful runs andERRORfor failed runs.Future Work (V2)
This MVP establishes the foundation for advanced checkpointing features planned for the future, including: