GEOSfvdycore with Spack

Spack for GEOSfvdycore Benchmark

This document outlines how to install GEOSfvdycore using Spack.

Note: in this example, we will be using the [email protected] compiler. This is being chosen mainly due to odd behavior currently seen in testing with the Intel oneAPI compilers and Spack.

Note 2: In recent versions of spack, [email protected] was marked as "deprecated". In a section below, we talk about this. If installing [email protected] has issues, you can use [email protected] or install [email protected] as a deprecated package.

Cloning Spack

To clone Spack:

git clone -c feature.manyFiles=true --depth=2 https://github.com/spack/spack.git

Shell configuration

Next, once you have spack cloned, you should enable the shell integration by sourcing the correct setup-env file. For example, for bash:

export SPACK_ROOT=$HOME/spack
. $SPACK_ROOT/share/spack/setup-env.sh

Of course, update the SPACK_ROOT variable to point to the location of your spack clone.

This is also recommended to be put in ~/.bashrc or similar for ease of use, or can be run each time you open a new shell.

Add TMPDIR to bashrc (optional)

On some systems, the default TMPDIR might be limited or in a problematic location. If so, update it to a location with more space and add to ~/.bashrc:

mkdir -p $HOME/tmpdir
echo 'export TMPDIR=$HOME/tmpdir' >> ~/.bashrc

Install spack needed packages

From https://spack.readthedocs.io/en/latest/getting_started.html#system-prerequisites there are pre-requisites for spack. For example, on Ubuntu:

sudo apt update
sudo apt install bzip2 ca-certificates g++ gcc gfortran git gzip lsb-release patch python3 tar unzip xz-utils zstd

If you are using a different operating system, you might need to translate these into the packaging system for your OS. The above link has a similar list for RHEL.

Update spack configuration files

Spack configuration files are (by default) in ~/.spack. We will be updating the packages.yaml and repos.yaml files.

Make spack configuration directory

mkdir -p ~/.spack

`packages.yaml`

The complete packages.yaml file we will use is:

packages:
  all:
    compiler: [gcc@=14.2.0]
    providers:
      mpi: [openmpi]
      blas: [openblas]
      lapack: [openblas]
  hdf5:
    variants: +fortran +szip +hl +threadsafe +mpi
  netcdf-c:
    variants: +dap
  esmf:
    variants: ~pnetcdf ~xerces
  cdo:
    variants: ~proj ~fftw3
  pflogger:
    variants: +mpi
  pfunit:
    variants: +mpi +fhamcrest
  fms:
    variants: ~gfs_phys +pic ~yaml constants=GEOS precision=32,64 +deprecated_io
  mapl:
    variants: +extdata2g +fargparse +pflogger +pfunit ~pnetcdf

Copy this into ~/.spack/packages.yaml.

This not only hardcodes the compiler and MPI stack (see below), but also the default variants for packages Spack will build. This eases the spack commands that will be used later (for convenience).

Spack 1.0 Differences

Spack is currently undergoing a change in how it handles compilers. As such packages.yaml (seen above) used to need:

packages:
  all:
    compiler: [gcc@=14.2.0]

but that now generates warnings from Spack like:

==> Warning: The packages:all:compiler preference has been deprecated in Spack v1.0, and is currently ignored. It will be removed from config in Spack v1.2.

Moreover, Spack used to use a file called ~/.spack/linux/compilers.yaml to manage compilers. That is now defunct. Compiler information is controlled in packages.yaml now.

Compiler and MPI

As seen above, the packages.yaml file is used to configure the packages that are available to spack.

In this example, we are telling it we want:

GNU Fortran as our Fortran compiler
Open MPI as our MPI stack
OpenBlas as our BLAS and LAPACK stack

Supported compilers and MPI stacks

GEOSfvdycore is not required to use these specific compilers (and barely has a dependency, if any, on BLAS/LAPACK). We use GCC 14 here as a reliable default. GEOSfvdycore supports for Fortran compilers:

GCC 13+
Intel Fortran Classic (ifort) 2021.6+
Intel OneAPI Fortran (ifx) 2025.0+

And MPI stacks that have been tested are:

Open MPI
Intel MPI
MPICH
MPT

It's possible MVAPICH2 might also work, but issues have been seen when using it on Discover, but time has not been invested to determine the actual issue.

NOTE: GEOSfvdycore does not have a strong dependence on the C and C++ compiler, we mainly focus on Fortran compilers in our testing.

Possible changes for Open MPI and SLURM

Some testing with Open MPI as found that the following variants might be useful or even need for packages.yaml on systems using SLURM. For example using Open MPI 4.1:

  openmpi:
    require:
    - "@4.1.7"
    - "%[email protected]"
    buildable: True
    variants: +legacylaunchers +internal-hwloc +internal-libevent +internal-pmix schedulers=slurm fabrics=ucx
  slurm:
    buildable: False
    externals:
    - spec: [email protected]
      prefix: /usr
  ucx:
    buildable: False
    externals:
    - spec: [email protected]
      prefix: /usr

NOTE 1: You will probably want to change the SLURM and UCX external specs to match the version of SLURM on your system.

NOTE 2: Testing has found that the system UCX is about the only "good" way we have found for Open MPI to see things like Infiniband interfaces (e.g., mlx5_0).

repos.yaml

We need an additional repo that has the recipe package.py file for GEOSfvdycore.

First clone the repository with:

git clone https://github.com/GMAO-SI-Team/geosesm-spack.git

Now, assuming that was in the home directory, you can add it to your repos.yaml file with:

spack repo add $HOME/geosesm-spack/spack_repo/geosesm

This should result in a repos.yaml file that looks like:

repos:
- /home/ubuntu/geosesm-spack/spack_repo/geosesm

where, of course, the path for your $HOME might be different.

Updating geosesm-spack

It's possible changes might be made to the geosesm-spack repo as time goes on. If updates are needed, you can update the repo with:

cd $HOME/geosesm-spack
git pull

Install GCC 14.2.0

With Spack

To install GCC 14.2.0 with Spack run:

spack install [email protected]

You then also need to tell Spack where the compiler is with:

spack compiler find $(spack location -i [email protected])

When you do that, you'll see it in the ~/.spack/packages.yaml file.

Issues installing GCC 14.2.0 with spack

At some point, [email protected] was marked as Deprecated by spack. As such, when you try to install you might see something like:

==> Error: failed to concretize `[email protected]` for the following reasons:
     1. Cannot satisfy '[email protected]'
        required because [email protected] requested explicitly
     2. Cannot satisfy '[email protected]'
     3. Cannot satisfy 'gcc@=11.5.0' and '[email protected]
        required because [email protected] requested explicitly
        required because gcc available as external when satisfying gcc@=11.5.0 languages:='c,c++,fortran'
          required because [email protected] requested explicitly
     4. Cannot satisfy '[email protected]:' and '[email protected]
        required because [email protected] requested explicitly
        required because @11.3: is a requirement for package gcc
          required because gcc available as external when satisfying gcc@=11.5.0 languages:='c,c++,fortran'
            required because [email protected] requested explicitly
          required because [email protected] requested explicitly

There are three "solutions" for this:

You can install [email protected] as a deprecated package with

spack install --deprecated [email protected]

You can tell spack to allow deprecated packages by editing/creating a file called ~/.spack/config.yaml and adding:

config:
  deprecated: true

You can move to [email protected] as that is now the "preferred" GCC 14 version:

spack install [email protected]

NOTE: If you choose the [email protected] path, then everywhere in this document you see a reference to [email protected] use [email protected]

With system package manager

If your system has GCC 14.2.0, you can instead just install it via your system package manager. Then you can run:

spack compiler find

might find it. You can check if it found it by looking at ~/.spack/packages.yaml and there should be entries for gcc

Install GEOSfvdycore

You can now install GEOSfvdycore with:

spack install geosfvdycore %[email protected]

On a test install, this built about 90 packages.

Running GEOSfvdycore

Running `fv3_setup`

Once GEOSfvdycore is built, we now need to make an experiment.

To do so, we need to go to the directory where the GEOSfvdycore package is installed:

cd $(spack location -i geosfvdycore %[email protected])/bin

Then, you can run the fv3_setup script. Below is an example run.

Experiment ID

The first question is the Experiment ID:

$ ./fv3_setup

Enter the Experiment ID:
test-c48

This is used to name a directory for the experiment, SLURM job name, etc. Nothing depends on what you put for this.

Experiment Description

Next is the experiment description:

Enter a 1-line Experiment Description:
test-c48

This is just mainly used in History output. Again, anything will be fine.

Model Resolution

Next up is the horizontal and vertical resolution.

Horizontal Resolution

First is the horizontal resolution of the experiment. The horizontal resolution is the number of cells in the x and y directions of each face of the cubed-sphere grid. So, a resolution of 48 (also called c48) means there are 48 cells in the x and y directions of each face resulting in a total of 48x48x6 = 13824 cells in the full grid.

Enter the Horizontal Resolution: IM (Default: 48)
48

Typical values for the horizontal (IM) resolution with the GEOSfvdycore standalone are:

IM: 48, 90, 180, 360, 720, 1440, 2880

Note: Technically, any value of IM can be used with the GEOSfvdycore standalone as it needs no input files, but these are the values run by the full GEOSgcm model.

The horizontal resolution will also determine the default FV_NX and FV_NY values set in fv3.j (where FV_NY = FV_NX * 6) and the total number of MPI processes needed will be FV_NX * FV_NY

The defaults are:

IM	FV_NX	FV_NY	Total Cores
48	6	36	216
90	10	60	600
180	20	120	2400
360	30	180	5400
720	40	240	9600
1440	80	480	38400

In this table, the IM is the "up to" number, so if you asked for IM=100 you'd land in the IM=180 row.

These numbers are not hard limits, and you can run with more or less cores. That said, if you run with less cores, you must make sure you have enough memory. If you run with too many cores, you might crash out in the dynamic core as the cubed-sphere dynamics requires at least 3 cells in the x and y directions per MPI process.

Vertical Resolution

The next question is the vertical resolution of the experiment:

Enter the Vertical Resolution: LM (Default: 181)
181

Like the horizontal resolution, technically any value can be used, but if you use a value that is not one of the standard values, you would need to provide a set of inputs that are not simple to calculate. As such, we recommend using only:

LM: 72, 91, 181

and the default of 181 is the standard value for the GEOSgcm.

IOSERVER

Next up is a question about IOSERVER, this only matters if you are outputting History files:

Do you wish to IOSERVER? (Default: NO or FALSE)
NO

If set to yes, the setup script will set things up to have the model running with additional nodes to handle writing of our history output in an asynchronous manner. This is useful for large runs where the writing of the history files can be a bottleneck.

NOTE: This question has a different default depending on the resolution, so make sure you answer as you want.

Location of Experiment

Next up is the location of the experiment:

Enter Desired Location for the EXPERIMENT Directory
Hit ENTER to use Default Location:
----------------------------------
Default:  /home/ubuntu/Experiments/test-c48
/home/ubuntu/Experiments/test-c48

You can put this anywhere but the last node of the path you give MUST be the Experiment ID you gave at the start.

Group ID

Finally, you will be asked for a group ID. This is used at NASA clusters for the SLURM/PBS account to use. If you do not need this, you can put anything.

Current GROUPS: ubuntu adm cdrom sudo dip lxd
Enter your GROUP ID for Current EXP: (Default: ubuntu)
-----------------------------------

Finalizing

Finally the setup script will create the experiment directory and tell you where it is:

Creating fv3.j for Experiment: test-c48-spack

Done!
-----

You can find your experiment in the directory:
      /home/ubuntu/Experiments/test-c48-spack

NOTE: fv3.j by default will run StandAlone_FV3_Dycore.x from the installation directory:
      /home/ubuntu/spack/opt/spack/linux-ubuntu24.04-icelake/gcc-14.2.0/geosfvdycore-3.0.0-rc.1-unohaghdhlkzenpn6evxyntsxphlcde4/bin

      However, if you copy an executable into the experiment directory, the script will
      run that executable instead.

Running `fv3_setup` on Azure or AWS

On Azure or AWS, you will need to edit the fv3_setup to tell it the appropriate number of cores per node on your setup.

On NASA systems we know of and control, we know the number of cores per node on most node types and can figure out the right SLURM or PBS directives to use, e.g.:

#SBATCH --nodes=2 --ntasks-per-node=120

And on systems we don't know, we just default to the number of cores we see on the node running fv3_setup:

#SBATCH --ntasks=216

But on AWS and Azure, we don't know this so there is a section like:

else if( $SITE == 'AWS' | $SITE == 'Azure' ) then

   # Because we do not know the name of the model or the number of CPUs
   # per node. We ask the user to set these variables in the script

   # AWS and Azure users must set the MODEL and NCPUS_PER_NODE
   set MODEL = USER_MUST_SET
   set NCPUS_PER_NODE = USER_MUST_SET

You should set the MODEL to whatever the node constraint name would be and NCPUS_PER_NODE to the number of cores per node.

Of course, you can always add

Running `fv3.j`

The fv3_setup script will create a fv3.j file. Some of the important parts of this file are described below.

NOTE: You will need to edit this file because we do not yet automatically support Spack as a build method in our setup scripting. This is being worked on.

Loading the Spack modules

You will need to load the Spack modules at the top of the fv3.j script. We recommend doing this near the top, for example:

limit stacksize unlimited
limit coredumpsize 0

source /path/to/spack/share/spack/setup-env.csh
spack load geosfvdycore %[email protected]

where the latter two lines are the ones you need to add. Note: fv3.j is a csh script, so you need to source the setup-env.csh file (even if your working shell is bash).

This will ensure that the geosfvdycore package is loaded and the libraries, etc. are found.

SLURM/PBS Directives

Assuming SLURM, a c48 run might typically look like:

#SBATCH --job-name=test-c48
#SBATCH --time=01:00:00
#SBATCH --ntasks=216
#SBATCH --account=ubuntu

In this case, fv3_setup didn't know what the system was and so just set --ntasks=216 which is the default for a c48 run.

If your system uses SLURM, say, and has required constraints, etc. you can add them here.

Also, on NASA systems, we typically run GEOSfvdycore on, say, AMD Milan nodes at 120 cores per node, even if the node has 128 cores. This is mainly done for both ease-of-math since most of our runs are in multiples of 120 and to leave some cores for the OS.

If you are running on a system with more or less cores-per-node, we recommend factors/multiples of 120, so 40, 60, 120, 240...

This is NOT required, but does more closely match how it is run on NASA systems.

Changing the number of MPI Processes (optional)

To change the number of processes used you will edit:

set FV_NX = 6
set FV_NY = 36

For example, this is saying an NX x NY grid of 6 x 36 = 216 MPI processes. To change this our usual rules are:

FV_NY MUST be divisible by 6
FV_NX MUST be less than FV_NY
FV_NY should roughly be 6 to 12 times FV_NX

So, for example, you could change this to:

set FV_NX = 10
set FV_NY = 60

or:

set FV_NX = 4
set FV_NY = 24

When you do this, you would want to change your SLURM/PBS directives as well.

Changing the model run length (optional)

By default fv3.j will setup for a 1 model day run. You can change this by changing this setting:

set JOB_SGMT = '00000001 000000'

This setting is in the format YYYYMMDD HHMMSS so as you can see this is set for 1 day, but if changed to 00000102 000000 it would run for 1 month and 2 days.

Turning off History (optional)

By default, fv3.j will output some history (i.e., diagnostic) output. But if you do not care about this and wish to reduce the I/O load and disk output (which can get large at very-high resolutions), you can turn off history output by finding the heredoc for HISTORY.rc and changing:

COLLECTIONS:  'inst3_3d_diag'
              'inst1_2d_diag'
              ::

to:

COLLECTIONS:  
              ::

Changing MPI run command (optional)

The GEOSfvdycore system uses a script called esma_mpirun which tries to "hide" MPI run command complexity from users. However, if you wish to change the MPI run command, you can do so by editing the fv3.j file and looking for:

set RUN_CMD = "$GEOSBIN/esma_mpirun -np "

You can change this to whatever you want, though it must end in the option for the number of MPI processes to run. Later on this is used as:

  $RUN_CMD $NPES $FV3EXE $IOSERVER_OPTIONS --logging_config logging.yaml |& tee ${SCRDIR}.log

so it expects the number of MPI processes to be the next argument.

Changing MPI environment (optional)

The GEOSfvdycore system will try and set "default" MPI environment variables depending on the MPI stack used. However, the defaults it chooses are those we've found useful on NASA systems. If you wish to change these you can do so by editing the fv3.j file and looking for the section headed by:

#########################################
# Set MPI Environment Variables
#########################################

Change/delete this as desired.

Changing the number of tracers (optional)

If you want to change the number of tracers advected, you can change:

# Set number of tracers
set N_TRACERS = 2

If you increase this, the model will take longer as you are doing more work.

Changing the number of cores per node (optional)

If you are running with IOSERVER, fv3.j needs to know the number of cores per node. fv3_setup might get this wrong on non-NASA systems. To check and possibly correct look for:

if ( $NCPUS != NULL ) then

   if ( $USE_IOSERVER == 1 ) then

      set NCPUS_PER_NODE = 16

and change.

Running the model

You can then run the script as either:

./fv3.j

if on an interactive allocation or with your job scheduler, e.g.:

sbatch fv3.j