-
Notifications
You must be signed in to change notification settings - Fork 4
GEOSfvdycore with Spack
This document outlines how to install GEOSfvdycore using Spack.
Note: in this example, we will be using the [email protected] compiler. This is being chosen
mainly due to odd behavior currently seen in testing with the Intel oneAPI compilers
and Spack.
Note 2: In recent versions of spack, [email protected] was marked as "deprecated". In a section below, we talk about this. If installing [email protected] has issues, you can use [email protected] or install [email protected] as a deprecated package.
To clone Spack:
git clone -c feature.manyFiles=true --depth=2 https://github.com/spack/spack.gitNext, once you have spack cloned, you should enable the shell integration by sourcing the correct setup-env file. For example, for bash:
export SPACK_ROOT=$HOME/spack
. $SPACK_ROOT/share/spack/setup-env.shOf course, update the SPACK_ROOT variable to point to the location of your spack clone.
This is also recommended to be put in ~/.bashrc or similar for ease of use, or can
be run each time you open a new shell.
On some systems, the default TMPDIR might be limited or in a problematic
location. If so, update it to a location with more space and add to ~/.bashrc:
mkdir -p $HOME/tmpdir
echo 'export TMPDIR=$HOME/tmpdir' >> ~/.bashrcFrom https://spack.readthedocs.io/en/latest/getting_started.html#system-prerequisites there are pre-requisites for spack. For example, on Ubuntu:
sudo apt update
sudo apt install bzip2 ca-certificates g++ gcc gfortran git gzip lsb-release patch python3 tar unzip xz-utils zstdIf you are using a different operating system, you might need to translate these into the packaging system for your OS. The above link has a similar list for RHEL.
Spack configuration files are (by default) in ~/.spack. We will be updating
the packages.yaml and repos.yaml files.
mkdir -p ~/.spackThe complete packages.yaml file we will use is:
packages:
all:
compiler: [gcc@=14.2.0]
providers:
mpi: [openmpi]
blas: [openblas]
lapack: [openblas]
hdf5:
variants: +fortran +szip +hl +threadsafe +mpi
netcdf-c:
variants: +dap
esmf:
variants: ~pnetcdf ~xerces
cdo:
variants: ~proj ~fftw3
pflogger:
variants: +mpi
pfunit:
variants: +mpi +fhamcrest
fms:
variants: ~gfs_phys +pic ~yaml constants=GEOS precision=32,64 +deprecated_io
mapl:
variants: +extdata2g +fargparse +pflogger +pfunit ~pnetcdfCopy this into ~/.spack/packages.yaml.
This not only hardcodes the compiler and MPI stack (see below), but also the default variants for packages Spack will build. This eases the spack commands that will be used later (for convenience).
Spack is currently undergoing a change in how it handles compilers. As such packages.yaml (seen above)
used to need:
packages:
all:
compiler: [gcc@=14.2.0]but that now generates warnings from Spack like:
==> Warning: The packages:all:compiler preference has been deprecated in Spack v1.0, and is currently ignored. It will be removed from config in Spack v1.2.
Moreover, Spack used to use a file called ~/.spack/linux/compilers.yaml to manage compilers. That is now defunct. Compiler information
is controlled in packages.yaml now.
As seen above, the packages.yaml file is used to configure the packages that are available to spack.
In this example, we are telling it we want:
- GNU Fortran as our Fortran compiler
- Open MPI as our MPI stack
- OpenBlas as our BLAS and LAPACK stack
GEOSfvdycore is not required to use these specific compilers (and barely has a dependency, if any, on BLAS/LAPACK). We use GCC 14 here as a reliable default. GEOSfvdycore supports for Fortran compilers:
- GCC 13+
- Intel Fortran Classic (
ifort) 2021.6+ - Intel OneAPI Fortran (
ifx) 2025.0+
And MPI stacks that have been tested are:
- Open MPI
- Intel MPI
- MPICH
- MPT
It's possible MVAPICH2 might also work, but issues have been seen when using it on Discover, but time has not been invested to determine the actual issue.
NOTE: GEOSfvdycore does not have a strong dependence on the C and C++ compiler, we mainly focus on Fortran compilers in our testing.
Some testing with Open MPI as found that the following variants might be useful or even
need for packages.yaml on systems using SLURM. For example using Open MPI 4.1:
openmpi:
require:
- "@4.1.7"
- "%[email protected]"
buildable: True
variants: +legacylaunchers +internal-hwloc +internal-libevent +internal-pmix schedulers=slurm fabrics=ucx
slurm:
buildable: False
externals:
- spec: [email protected]
prefix: /usr
ucx:
buildable: False
externals:
- spec: [email protected]
prefix: /usrNOTE 1: You will probably want to change the SLURM and UCX external specs to match the version of SLURM on your system.
NOTE 2: Testing has found that the system UCX is about the only "good" way we have found for Open MPI to see
things like Infiniband interfaces (e.g., mlx5_0).
We need an additional repo that has the recipe package.py file for GEOSfvdycore.
First clone the repository with:
git clone https://github.com/GMAO-SI-Team/geosesm-spack.gitNow, assuming that was in the home directory, you can add it to your
repos.yaml file with:
spack repo add $HOME/geosesm-spack/spack_repo/geosesmThis should result in a repos.yaml file that looks like:
repos:
- /home/ubuntu/geosesm-spack/spack_repo/geosesmwhere, of course, the path for your $HOME might be different.
It's possible changes might be made to the geosesm-spack repo as time goes on. If updates are needed, you can update the repo with:
cd $HOME/geosesm-spack
git pullTo install GCC 14.2.0 with Spack run:
spack install [email protected]You then also need to tell Spack where the compiler is with:
spack compiler find $(spack location -i [email protected])When you do that, you'll see it in the ~/.spack/packages.yaml file.
At some point, [email protected] was marked as Deprecated by spack. As such, when you try to install you might see something like:
==> Error: failed to concretize `[email protected]` for the following reasons:
1. Cannot satisfy '[email protected]'
required because [email protected] requested explicitly
2. Cannot satisfy '[email protected]'
3. Cannot satisfy 'gcc@=11.5.0' and '[email protected]
required because [email protected] requested explicitly
required because gcc available as external when satisfying gcc@=11.5.0 languages:='c,c++,fortran'
required because [email protected] requested explicitly
4. Cannot satisfy '[email protected]:' and '[email protected]
required because [email protected] requested explicitly
required because @11.3: is a requirement for package gcc
required because gcc available as external when satisfying gcc@=11.5.0 languages:='c,c++,fortran'
required because [email protected] requested explicitly
required because [email protected] requested explicitly
There are three "solutions" for this:
- You can install
[email protected]as a deprecated package with
spack install --deprecated [email protected]
- You can tell spack to allow deprecated packages by editing/creating a file called
~/.spack/config.yamland adding:
config:
deprecated: true- You can move to
[email protected]as that is now the "preferred" GCC 14 version:
spack install [email protected]
NOTE: If you choose the [email protected] path, then everywhere in this document you see a reference to [email protected] use [email protected]
If your system has GCC 14.2.0, you can instead just install it via your system package manager. Then you can run:
spack compiler findmight find it. You can check if it found it by looking at
~/.spack/packages.yaml and there should be entries
for gcc
You can now install GEOSfvdycore with:
spack install geosfvdycore %[email protected]On a test install, this built about 90 packages.
Once GEOSfvdycore is built, we now need to make an experiment.
To do so, we need to go to the directory where the GEOSfvdycore package is installed:
cd $(spack location -i geosfvdycore %[email protected])/binThen, you can run the fv3_setup script. Below is an
example run.
The first question is the Experiment ID:
$ ./fv3_setup
Enter the Experiment ID:
test-c48This is used to name a directory for the experiment, SLURM job name, etc. Nothing depends on what you put for this.
Next is the experiment description:
Enter a 1-line Experiment Description:
test-c48This is just mainly used in History output. Again, anything will be fine.
Next up is the horizontal and vertical resolution.
First is the horizontal resolution of the experiment. The horizontal
resolution is the number of cells in the x and y directions of each
face of the cubed-sphere grid. So, a resolution of 48 (also called c48)
means there are 48 cells in the x and y directions of each face resulting
in a total of 48x48x6 = 13824 cells in the full grid.
Enter the Horizontal Resolution: IM (Default: 48)
48Typical values for the horizontal (IM) resolution
with the GEOSfvdycore standalone are:
-
IM: 48, 90, 180, 360, 720, 1440, 2880
Note: Technically, any value of IM can be used with the GEOSfvdycore standalone
as it needs no input files, but these are the values run by the full GEOSgcm model.
The horizontal resolution will also determine the default FV_NX and FV_NY
values set in fv3.j (where FV_NY = FV_NX * 6) and the total
number of MPI processes needed will be FV_NX * FV_NY
The defaults are:
| IM | FV_NX | FV_NY | Total Cores |
|---|---|---|---|
| 48 | 6 | 36 | 216 |
| 90 | 10 | 60 | 600 |
| 180 | 20 | 120 | 2400 |
| 360 | 30 | 180 | 5400 |
| 720 | 40 | 240 | 9600 |
| 1440 | 80 | 480 | 38400 |
In this table, the IM is the "up to" number, so if you asked for
IM=100 you'd land in the IM=180 row.
These numbers are not hard limits, and you can run with more or less cores. That said, if you run with less cores, you must make sure you have enough memory. If you run with too many cores, you might crash out in the dynamic core as the cubed-sphere dynamics requires at least 3 cells in the x and y directions per MPI process.
The next question is the vertical resolution of the experiment:
Enter the Vertical Resolution: LM (Default: 181)
181Like the horizontal resolution, technically any value can be used, but if you use a value that is not one of the standard values, you would need to provide a set of inputs that are not simple to calculate. As such, we recommend using only:
-
LM: 72, 91, 181
and the default of 181 is the standard value for the GEOSgcm.
Next up is a question about IOSERVER, this only matters if you are
outputting History files:
Do you wish to IOSERVER? (Default: NO or FALSE)
NOIf set to yes, the setup script will set things up to have the model running with additional nodes to handle writing of our history output in an asynchronous manner. This is useful for large runs where the writing of the history files can be a bottleneck.
NOTE: This question has a different default depending on the resolution, so make sure you answer as you want.
Next up is the location of the experiment:
Enter Desired Location for the EXPERIMENT Directory
Hit ENTER to use Default Location:
----------------------------------
Default: /home/ubuntu/Experiments/test-c48
/home/ubuntu/Experiments/test-c48You can put this anywhere but the last node of the path you give MUST be the Experiment ID you gave at the start.
Finally, you will be asked for a group ID. This is used at NASA clusters for the SLURM/PBS account to use. If you do not need this, you can put anything.
Current GROUPS: ubuntu adm cdrom sudo dip lxd
Enter your GROUP ID for Current EXP: (Default: ubuntu)
-----------------------------------Finally the setup script will create the experiment directory and tell you where it is:
Creating fv3.j for Experiment: test-c48-spack
Done!
-----
You can find your experiment in the directory:
/home/ubuntu/Experiments/test-c48-spack
NOTE: fv3.j by default will run StandAlone_FV3_Dycore.x from the installation directory:
/home/ubuntu/spack/opt/spack/linux-ubuntu24.04-icelake/gcc-14.2.0/geosfvdycore-3.0.0-rc.1-unohaghdhlkzenpn6evxyntsxphlcde4/bin
However, if you copy an executable into the experiment directory, the script will
run that executable instead.On Azure or AWS, you will need to edit the fv3_setup to
tell it the appropriate number of cores per node on your setup.
On NASA systems we know of and control, we know the number of cores per node on most node types and can figure out the right SLURM or PBS directives to use, e.g.:
#SBATCH --nodes=2 --ntasks-per-node=120
And on systems we don't know, we just default to the
number of cores we see on the node running fv3_setup:
#SBATCH --ntasks=216
But on AWS and Azure, we don't know this so there is a section like:
else if( $SITE == 'AWS' | $SITE == 'Azure' ) then
# Because we do not know the name of the model or the number of CPUs
# per node. We ask the user to set these variables in the script
# AWS and Azure users must set the MODEL and NCPUS_PER_NODE
set MODEL = USER_MUST_SET
set NCPUS_PER_NODE = USER_MUST_SETYou should set the MODEL to whatever the node constraint name
would be and NCPUS_PER_NODE to the number of cores per node.
Of course, you can always add
The fv3_setup script will create a fv3.j file. Some of the
important parts of this file are described below.
NOTE: You will need to edit this file because we do not yet automatically support Spack as a build method in our setup scripting. This is being worked on.
You will need to load the Spack modules at the top of the fv3.j script.
We recommend doing this near the top, for example:
limit stacksize unlimited
limit coredumpsize 0
source /path/to/spack/share/spack/setup-env.csh
spack load geosfvdycore %[email protected]where the latter two lines are the ones you need to add. Note:
fv3.j is a csh script, so you need to source the setup-env.csh
file (even if your working shell is bash).
This will ensure that the geosfvdycore package is loaded and the
libraries, etc. are found.
Assuming SLURM, a c48 run might typically look like:
#SBATCH --job-name=test-c48
#SBATCH --time=01:00:00
#SBATCH --ntasks=216
#SBATCH --account=ubuntu
In this case, fv3_setup didn't know what the system was and so
just set --ntasks=216 which is the default for a c48 run.
If your system uses SLURM, say, and has required constraints, etc. you can add them here.
Also, on NASA systems, we typically run GEOSfvdycore on, say, AMD Milan nodes at 120 cores per node, even if the node has 128 cores. This is mainly done for both ease-of-math since most of our runs are in multiples of 120 and to leave some cores for the OS.
If you are running on a system with more or less cores-per-node, we recommend factors/multiples of 120, so 40, 60, 120, 240...
This is NOT required, but does more closely match how it is run on NASA systems.
To change the number of processes used you will edit:
set FV_NX = 6
set FV_NY = 36
For example, this is saying an NX x NY grid of 6 x 36 = 216 MPI processes. To change
this our usual rules are:
-
FV_NYMUST be divisible by 6 -
FV_NXMUST be less thanFV_NY -
FV_NYshould roughly be 6 to 12 timesFV_NX
So, for example, you could change this to:
set FV_NX = 10
set FV_NY = 60
or:
set FV_NX = 4
set FV_NY = 24
When you do this, you would want to change your SLURM/PBS directives as well.
By default fv3.j will setup for a 1 model day run. You can change
this by changing this setting:
set JOB_SGMT = '00000001 000000'
This setting is in the format YYYYMMDD HHMMSS so as you can see
this is set for 1 day, but if changed to 00000102 000000
it would run for 1 month and 2 days.
By default, fv3.j will output some history (i.e., diagnostic) output.
But if you do not care about this and wish to reduce the I/O load and
disk output (which can get large at very-high resolutions), you can
turn off history output by finding the heredoc for HISTORY.rc and changing:
COLLECTIONS: 'inst3_3d_diag'
'inst1_2d_diag'
::
to:
COLLECTIONS:
::
The GEOSfvdycore system uses a script called esma_mpirun which
tries to "hide" MPI run command complexity from users. However,
if you wish to change the MPI run command, you can do so by editing
the fv3.j file and looking for:
set RUN_CMD = "$GEOSBIN/esma_mpirun -np "
You can change this to whatever you want, though it must end in the option for the number of MPI processes to run. Later on this is used as:
$RUN_CMD $NPES $FV3EXE $IOSERVER_OPTIONS --logging_config logging.yaml |& tee ${SCRDIR}.log
so it expects the number of MPI processes to be the next argument.
The GEOSfvdycore system will try and set "default" MPI environment
variables depending on the MPI stack used. However, the defaults
it chooses are those we've found useful on NASA systems. If you
wish to change these you can do so by editing the fv3.j file and looking
for the section headed by:
#########################################
# Set MPI Environment Variables
#########################################
Change/delete this as desired.
If you want to change the number of tracers advected, you can change:
# Set number of tracers
set N_TRACERS = 2
If you increase this, the model will take longer as you are doing more work.
If you are running with IOSERVER, fv3.j needs to know the
number of cores per node. fv3_setup might get this wrong
on non-NASA systems. To check and possibly correct look for:
if ( $NCPUS != NULL ) then
if ( $USE_IOSERVER == 1 ) then
set NCPUS_PER_NODE = 16
and change.
You can then run the script as either:
./fv3.j
if on an interactive allocation or with your job scheduler, e.g.:
sbatch fv3.j