- uv: running scripts; managing virtual environments
- ffmpeg: optional video generation during evaluation
For development, install with uv:
uv sync --extra devIf contributing to the codebase, you can run
pre-commit install This will setup the pre-commit checks so any pushed commits will pass the CI.
Train an encoder-decoder stack and evaluate the resulting checkpoint:
# Train
uv run python -m autocast.train.autoencoder --config-path=configs/Train an encoder-processor-decoder stack and evaluate the resulting checkpoint:
# Train
uv run train_encoder_processor_decoder \
--config-path=configs/ \
--work-dir=outputs/encoder_processor_decoder_run
# Evaluate
uv run evaluate_encoder_processor_decoder \
--config-path=configs/ \
--work-dir=outputs/processor_eval \
--checkpoint=outputs/encoder_processor_decoder_run/encoder_processor_decoder.ckpt \
--batch-index=0 --batch-index=1 \ # Optional batch indices, remove to skip videos
--video-dir=outputs/encoder_processor_decoder_run/videosEvaluation writes a CSV of aggregate metrics to --csv-path (defaults to
<work-dir>/evaluation_metrics.csv) and, when --batch-index is provided,
stores rollout animations for the specified test batches.
This assumes you have the reaction_diffusion dataset stored at the path specified by
the AUTOCAST_DATASETS environment variable.
uv run python -m autocast.train.autoencoder \
--config-path=configs \
--work-dir=outputs/rd/00 \
data.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
data.use_simulator=false \
model.learning_rate=0.00005 \
trainer.max_epochs=10 \
logging.wandb.enabled=trueOr alternatively with the included bash script:
./scripts/ae.sh rd 00 reaction_diffusionuv run python -m autocast.train.encoder_processor_decoder \
--config-path=configs \
--work-dir=outputs/rd/00 \
data.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
data.use_simulator=false \
model.learning_rate=0.0001 \
trainer.max_epochs=10 \
logging.wandb.enabled=true \
training.autoencoder_checkpoint=outputs/rd/00/autoencoder.ckptOr alternatively with the included bash script:
./scripts/epd.sh rd 00 reaction_diffusionuv run evaluate_encoder_processor_decoder \
--config-path=configs/ \
--work-dir=outputs/rd/00/eval \
--checkpoint=outputs/rd/00/encoder_processor_decoder.ckpt \
--batch-index=0 \
--batch-index=1 \
--batch-index=2 \
--batch-index=3 \
--video-dir=outputs/rd/00/eval/videos \
data.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
data.use_simulator=falseOr alternatively with the included bash script:
./scripts/eval.sh rd 00 reaction_diffusionAutoCast now ships with an optional Weights & Biases integration that is
fully driven by the Hydra config under configs/logging/wandb.yaml.
-
Enable logging for CLI workflows by passing Hydra config overrides as positional arguments:
uv run train_encoder_processor_decoder \ --config-path=configs \ logging.wandb.enabled=true \ logging.wandb.project=autocast-experiments \ logging.wandb.name=processor-baseline
-
The autoencoder/processor training CLIs pass the configured
WandbLoggerdirectly into Lightning so that metrics, checkpoints, and artifacts are synchronized automatically. -
The evaluation CLI reports aggregate test metrics to the same run when logging is enabled, making it easy to compare training and evaluation outputs in one dashboard.
-
All notebooks contain a dedicated cell that instantiates a
wandb_loggerviaautocast.logging.create_wandb_logger. Toggle theenabledflag in that cell to control tracking when experimenting interactively.
When enabled remains false (the default), the logger is skipped entirely, so the stack can
be used without a W&B account.
In the slurm_templates folders, template slurm scripts can be found for the following use cases:
- train_and_eval_autoencoder.sh : Training and evaluation of the autoencoder
- train_and_eval_encoder-processor-decoder.sh : Training and evaluation of the encoder-processor-decoder approach
- encoder-processor-decoder-parameter_sweep : Same as above but runs a parameter sweep
We advise you copy these scripts into a folder called slurm_scripts (which is in the gitignore) and edit as you see fit.
To run, simply navigate to the top level of this repository, and run:
sbatch scripts/train_and_eval_encoder-processor-decoder.sh or
sbatch scripts/train_and_eval_autoencoder.sh depending on which model you would like to run.
This will train and evaluate the model using the settings in the corresponding config (found in the configs folder). Outputs from both train and eval will be written out to an outputs folder with the following naming convention:
outputs/{job_name}/{$date +%Y%m%d_%H%M%S}.
scripts/encoder-processor-decoder-parameter_sweep.sh is an example parameter sweep.
It uses slurm arrays and hydra override functionality to sweep through combinations of parameters. The resulting output structure looks like this:
- outputs
- {job_name}
- job-{job_id} # Unique for each sweep run
- parameter_lookup.csv # csv file mapping task id to parameter values.
- task-0 # 0 is the slurm array task id. It is unique for each set of parameters
- task-1
- etc.
- job-{job_id} # Unique for each sweep run
- {job_name}
A checklist of things to change in the example script:
--array=0-8: This is the number of parallel jobs to run. This should be equal to the number of parameter combinations you want to run.JOB_NAME="encoder_processor_decoder_sweep": Name of the Job. This is the top level directory.- The whole Define Parameter Grid section.
- The columns to be writte to the parameter csv file
- The flags in the python script to overright the hydra config. e.g.
trainer.max_epochs=${MAX_EPOCH}