Scripts and advice for running Pangeo with dask-jobqueue on NCI's Gadi, Pawsey's Zeus and CSIRO's Pearcey
These are the scripts that I (Dougie) use. They may be useful to you.
(Note that the NCI-recommended approach for using Pangeo on Gadi is outlined here: https://nci-data-training.readthedocs.io/en/latest/_notebook/prep/pangeo.html. The scripts and instructions in this repo describe an alternative approach where users can manage their own conda environment and scale clusters using dask-jobqueue)
Users will need to be able to log in to their system of interest. To use Gadi and Pawsey, users will need to be able to request resources under a project.
New users to Gadi can sign up here, but they will need to either join an existing project or propose a new project to be able to access NCI resources. Existing users can check their projects here.
New users to Pawsey can apply here.
Ideally, users will have a github account (it's free and easy to set up here), but this is not essential.
-
Log in to your system of choice:
Gadi:
ssh -Y <username>@gadi.nci.org.au
Zeus:ssh -Y <username>@zeus.pawsey.org.au
Pearcey:ssh -Y <username>@pearcey.hpc.csiro.au -
If you don't have conda installed or access to conda (try
which conda), install it:wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod +x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.shYou'll get prompted for where to install conda. The default is home, which is quite limited for space. I recommend instead using a persistent location, e.g.
/g/dataon Gadi,/groupon Zeus or Bowen storage on Pearcey.Note, to run the scripts in this repo
condawill need to be initialised. When you first install conda you will be given the option to append some lines to your.bashrcthat will initialisecondaevery time you log in. I recommend doing this. Otherwise, you'll have to initialisecondamanually before progressing. -
If there's any possibility you might edit the scripts in this repo and want to keep track of your edits using git, create a fork of this repo under your own github account by clicking on the
Forkbutton on the top right of this page. Doing this will create a replica of this repo under your username athttps://github.com/<your_username>/pangeo_hpc.git. If you don't have a github account and you don't want to create one, go to step 4. -
Clone your fork of this repo to a location of your choice on Gadi, Zeus or Pearcey: go to the desired location and run
git clone https://github.com/<your_username>/pangeo_hpc.git(orgit clone [email protected]:<your_username>/pangeo_hpc.gitif using ssh keys). If you didn't create a fork, clone this repo directly:git clone https://github.com/csiro-dcfp/pangeo_hpc.git. -
If you don't already have a pangeo-like conda environment (containing
jupyter,xarray,dask...), create one using thepangeo_environment.ymlfile in this repo:conda env create -f pangeo_environment.yml. This will create a new conda environment calledpangeo. If you wish to use a different name:conda env create --name <different_name> -f pangeo_environment.yml. -
Activate your new
pangeoenvironment and install/enable the following Jupyter labextensions (you'll only need to do this once). Note, these are not essential, but they'll make some handy tools available from your JupyterLab environement:conda activate pangeo # For using the dask extension within JupyterLab (https://github.com/dask/dask-labextension) jupyter labextension install --no-build --clean dask-labextension jupyter serverextension enable --sys-prefix --py dask_labextension # For using widgets within JupyterLab (https://ipywidgets.readthedocs.io/en/latest/user_install.html) jupyter labextension install --no-build --clean @jupyter-widgets/jupyterlab-manager jupyter nbextension enable --sys-prefix --py widgetsnbextension # For simplifying setting up the dask dashboard (https://github.com/jupyterhub/jupyter-server-proxy) jupyter labextension install --no-build --clean @jupyterlab/server-proxy # For managing versions of your Jupyter notebooks in other languages (https://github.com/mwouts/jupytext) jupyter labextension install --no-build --clean jupyterlab-jupytext jupyter nbextension enable --sys-prefix --py jupytext # For visual debugging (https://github.com/jupyterlab/debugger) jupyter labextension install --no-build --clean @jupyterlab/debugger # Build JupyterLab jupyter lab build # Clean up unnecessary cache files to reduce inode footprint jupyter lab clean jlpm cache clean -
Configure your Jupyter password:
jupyter notebook --generate-config jupyter notebook passwordand follow the prompts.
-
At this point, you're ready to submit a job to run your JupyterLab and Python instances. Once this job is running and you've accessed JupyterLab via your web browser (see below) you'll be able to request additional resources as a dask cluster (using
dask-jobqueue). We can submit a job to run our JupyterLab instance using the relevantstart_jupyter_<system>.shscript but it may require a little editing first:- Edit the PBS/SLURM header information (the
#PBS/#SLURMlines) to reflect your project (if relevant), required resources, etc. Remember these do not need to represent the total resources you require for the job you have planned because you will be able to request additional resources from within JupyterLab usingdask-jobqueue. For interactive science work, I usually request few resources for a relatively long time, and then do compute-heavy reduction task(s) on shorter-termdask-jobqueueclusters. With this type of workflow, the resources you request instart_jupyter_<system>.shneed only reflect what is needed to handle the reduced data.
You could now go ahead and submit your
start_jupyter_<system>.shscript to the queue. However, for convenience I've also written a simple function for handling the submission ofstart_jupyter_<system>.shand parsing instructions from the output file. This function receives some of the key job specifications as optional inputs so you don't have to edit the header onstart_jupyter_<system>.sheverytime you want to change any of these. It also receives the name of your pangeo-like conda environment as an input. You can append this function to your.bashrcby running./instantiate_pangeo_function.sh. Thepangeofunction signature is:Gadi:
pangeo walltime(02:00:00) ncpus(4) mem(16GB) project($PROJECT) pangeo_env_name(pangeo) notebook_directory(~)
Zeus:pangeo time(02:00:00) cpus_per_task(4) mem-per-cpu(4GB) account($PAWSEY_PROJECT) pangeo_env_name(pangeo) notebook_directory(~)
Pearcey:pangeo time(02:00:00) cpus_per_task(4) mem-per-cpu(6GB) pangeo_env_name(pangeo) notebook_directory(~)where the defaults are given in brackets. For example, to run with the default settings, one would simply enter into their terminal:
pangeoTo specify a 3 hour job with 6 cpus, one would enter:
pangeo 03:00:00 6 - Edit the PBS/SLURM header information (the
-
Run the
pangeofunction or submitstart_jupyter_<system>.shto the queue. For the former, instructions for setting up port forwarding to view your JupyterLab session will be printed to your screen. For the latter, you'll have to parse them from thejupyter_instructions.txtfile that will appear in the current directory. In both cases, the instructions will only appear once your jobs leaves the queue which may take a minute or so. -
Follow the instructions to access your JupyterLab session via a web browser.
-
Do your science. As mentioned above, my typical workflow is to use
dask-jobqueueto request and access resources for the "heavy-lifting" in my notebooks (e.g. reducing a large dataset down to a 1D or 2D field to plot). Examples of setting up adask-jobqueuecluster are given in the notebooks directory of this repo.Note that getting
dask-jobqueuerunning on Gadi requires the manipulation of the default jobscripts submitted by dask'sPBSClusterinto a format that Gadi expects. An example of this hack is given innotebooks/run_dask-jobqueue_Gadi.ipynb.