Skip to content

Conversation

@Aaryan-549
Copy link

Description

This PR implements an MVP for checkpointing and error file tracking by adding a status attribute to output NetCDF files. This ensures that users and automated systems can easily determine whether a simulation completed successfully, is a checkpoint, or encountered an error condition.

Addresses: #1679

Problem

Currently, TORAX simulations that encounter errors (NaN detection, negative profiles, quasineutrality violations, or reaching minimum timestep) may not write output files consistently. This makes it difficult to:

  • Debug long-running simulations that fail.
  • Distinguish programmatically between completed and failed runs.
  • Implement checkpointing for verification, surrogate training, or long-pulse scenarios (e.g., DEMO, STEP, ARC).

Solution

This implementation follows the design feedback from the issue, prioritizing simplicity and maximal reuse of existing infrastructure.

  1. New SimStatus Enum (_src/state.py):

    • COMPLETED: Simulation reached t_final successfully.
    • CHECKPOINT: Intermediate checkpoint (reserved for V2/future use).
    • ERROR: Simulation stopped due to an error condition.
  2. Status Attribute in Output (_src/output_tools/output.py):

    • All NetCDF output files now include a global status attribute.
    • This is automatically set based on the sim_error state.
    • Accessible via data_tree.attrs['status'].
  3. Consistent Error File Writing (_src/simulation_app.py):

    • Output files are now explicitly written for ALL error conditions:
      • NAN_DETECTED
      • NEGATIVE_CORE_PROFILES
      • QUASINEUTRALITY_BROKEN
      • REACHED_MIN_DT
    • Files contain all valid timesteps captured up to the error.

Design Principles Followed

  • Simplicity: Minimal changes with no complex logic introduced.
  • Reuse: Uses the standard NetCDF file format and existing output_tools.
  • No Breaking Changes: Purely additive; does not modify existing restart behavior or API.
  • Single Format: Checkpoints/Error files use the exact same schema as regular output files.

Changes

  • torax/_src/state.py: Added SimStatus StrEnum.
  • torax/_src/output_tools/output.py: Added logic to inject the status attribute into the DataTree.
  • torax/_src/simulation_app.py: Improved error logging and ensured file writing triggers on error states.
  • torax/_src/output_tools/tests/output_test.py: Added comprehensive tests for the status attribute.

Testing

  • Compilation: All files compile successfully.
  • New Tests: Verified status attribute is correctly set to COMPLETED for successful runs and ERROR for failed runs.
  • Regression: Existing tests pass; fully backward compatible.

Future Work (V2)

This MVP establishes the foundation for advanced checkpointing features planned for the future, including:

  • Periodic checkpoint writing during simulation execution.
  • Unification of Checkpoint and Restart APIs.
  • Configurable checkpoint intervals and automatic cleanup.

Aaryan-549 and others added 7 commits November 6, 2025 03:28
This commit implements a Custom Pedestal Model API that allows users to
define custom pedestal scaling laws without modifying TORAX source code.

Fixes google-deepmind#1711

## Changes

- Add CustomPedestalModel class supporting user-defined callable functions
- Add CustomPedestal Pydantic configuration
- Update PedestalConfig union to include CustomPedestal
- Add comprehensive unit tests (7 test cases)
- Add example configuration with EPED-like scaling
- Add complete API documentation

## Features

Users can now provide Python functions to compute:
- Ion temperature at pedestal (T_i_ped)
- Electron temperature at pedestal (T_e_ped)
- Electron density at pedestal (n_e_ped)
- Optional dynamic pedestal location (rho_norm_ped_top)

Functions receive full access to runtime parameters, geometry, and
core profiles, enabling machine-specific scaling laws (e.g., STEP
pedestal models with Europed data fits).

## API Design

Follows the transport model pattern with:
- JAX Model Layer: CustomPedestalModel (frozen dataclass)
- Pydantic Config Layer: CustomPedestal (validation)
- Runtime Parameters: time-varying support

Fully backwards compatible - no changes to existing models.
- Add SimStatus StrEnum (completed, checkpoint, error) to state.py
- Add status attribute to nc output files in output.py
- Ensure nc files are always written, even on errors
- Add tests for status attribute with both completed and error states

Addresses google-deepmind#1679 (MVP implementation)

This implements a simple MVP for checkpointing/error file tracking:
- Output nc files now include a 'status' attribute indicating whether
  the simulation completed, is a checkpoint, or encountered an error
- The simulation writes output files for all error conditions
  (NaN, negative profiles, quasineutrality broken, reached min timestep)
- Uses existing infrastructure - just the standard nc file format
- No breaking changes to API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant