The images defined in this repository capture reproducible computing environments used by Pangeo Cloud. They build on top of the Ubuntu operating system and include conda environments with a curated set of Python packages for geospatial analysis. While intended for Pangeo Cloud, they can be used outside of Pangeo infrastructure too!
Images are hosted on DockerHub: https://hub.docker.com/u/pangeo and on Quay.io: https://quay.io/organization/pangeo
| Image | Description | Size | Pulls |
|---|---|---|---|
| base-image | Foundational Dockerfile for builds | ||
| base-notebook | minimally functional image for pangeo hubs | ||
| pangeo-notebook | base-notebook + core earth science analysis packages | ||
| pytorch-notebook | pangeo-notebook + GPU-enabled pytorch | ||
| ml-notebook | pangeo-notebook + GPU-enabled tensorflow2 | ||
| forge | pangeo-notebook + Apache Beam support |
Click on the image name in the table above for a current list of installed packages and versions
graph TD;
base-image-->base-notebook;
base-notebook-->pangeo-notebook;
pangeo-notebook-->pytorch-notebook;
pangeo-notebook-->ml-notebook;
pangeo-notebook-->forge;
click base-image "https://hub.docker.com/r/pangeo/base-image" "Open this in a new tab" _blank
click base-notebook "https://hub.docker.com/r/pangeo/base-notebook" "Open this in a new tab" _blank
click pangeo-notebook "https://hub.docker.com/r/pangeo/pangeo-notebook" "Open this in a new tab" _blank
click pytorch-notebook "https://hub.docker.com/r/pangeo/pytorch-notebook" "Open this in a new tab" _blank
click ml-notebook "https://hub.docker.com/r/pangeo/ml-notebook" "Open this in a new tab" _blank
click forge "https://hub.docker.com/r/pangeo/forge" "Open this in a new tab" _blank
A major use-case for these images is running an ephemeral server on the Cloud with BinderHub. Anyone can launch a server running the latest-and-greatest pangeo-notebook image with the following URL
Users who need the special features offered by Pangeo binder can use the following links for running in GCP us-central1 or AWS us-west-2 respectively:
- https://binder.pangeo.io/v2/gh/pangeo-data/pangeo-docker-images/HEAD?urlpath=lab
- https://aws-uswest2-binder.pangeo.io/v2/gh/pangeo-data/pangeo-docker-images/HEAD?urlpath=lab
NOTE: the links above resolve to the pangeo-notebook image and not base-notebook, ml-notebook or pytorch-notebook that are also defined in this repository. Currently BinderHubs map to a single image definition per repository.
The links above will launch Jupyterlab without any notebooks or other content. From Jupyterlab you can then upload notebooks or run git pull commands to retrieve content in another GitHub repository. However, it can be very useful to pre-load content when a server launches. nbgitpuller link generator is very useful for this!
Below is a link to illustrate launching pangeo-notebook/2021.09.30 and automatically pulling the notebooks housed in https://github.com/pangeo-data/cog-best-practices.
Those links get a bit long and complicated to look at, so it's common use a markdown button to hide them:
| AWS | GCP |
|---|---|
Advanced users may want a highly customized environment that still works on Pangeo BinderHubs. You can do that by building off the pangeo base-image following our template repository example. Further documentation on the configuration files in the binder subfolder can be found in the repo2docker documentation.
docker run -it --rm -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0
NOTE: images are mirrored on quay.io so you can also pull quay.io/pangeo/pangeo-notebook:latest
To access files from your local hard drive from within the Docker Jupyterlab, you need to use a Docker volume mount. The following command will mount your home directory in the docker container and launch the Jupyterlab from there.
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0 $HOME
You can also run commands other than jupyter when starting a Docker container:
docker run -it --rm pangeo/base-notebook:2021.09.30 /bin/bash
If you're doing Machine Learning and want to use NVIDIA GPUs,
follow the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
to install nvidia-docker, and then start the Docker container like so:
docker run -it --rm --gpus all -p 8888:8888 pangeo/pytorch-notebook:master jupyter lab --ip 0.0.0.0
Many Cloud providers offer services to run Docker containers in their data centers. Instructions will vary, so we don't provide specifics here, but as an example, check out these docs for running containers on the cloud via Docker Compose:
If you're used to managing conda environments on your personal computer, or running a hosted JupyterLab service like Google Colab or AWS SageMaker Studio Lab, you can exactly match a tagged pangeo-notebook conda environment. For example, below we install the pangeo-notebook environment tagged on 2021.12.02:
%conda create -n pangeo-notebook --file https://raw.githubusercontent.com/pangeo-data/pangeo-docker-images/2021.12.02/pangeo-notebook/conda-linux-64.lock
Note that this will only work on linux environments, since the conda lockfile is specific to linux.
This repository uses GitHub Actions to build images, run tests, and push images to DockerHub.
-
Pull requests from forks trigger rebuilding all images
-
pangeo/base-notebook:mastercorresponds to current "staging" image in sync with master branch. Built with every commit to master. Also tagged with short GitHub short SHApangeo/base-notebook:2639bd3. -
Tags pushed to GitHub manually represent "production" releases with corresponding tags on DockerHub
pangeo/pangeo-notebook:2020.03.11. Thelatesttag also corresponds to the most recent GitHub tag.
A common need is to update conda package versions in these images. To do so simply, 1) Fork this repo, 2) edit pangeo-notebook/environment.yml on your fork, 3) create a PR. Compatible packages versions with conda-lock and a lock file is automatically committed added as a commit in your PR.
You'll need at least Conda installed, and Docker if you want to build and test locally.
# create a fork of this repo and clone it locally
git clone https://github.com/mygithub/pangeo-docker-images
cd pangeo-docker-images
# Install conda-lock
conda env create -f environment-condalock.yml
git checkout -b change-pangeo-notebook
Edit pangeo-notebook/environment.yml to change packages! Note that make pangeo-notebook is a convenient shortcut to build and test. See the Makefile for specific commands that are run. For example, you can just run conda-lock and don't have to run Docker to build and test locally.
make pangeo-notebook
git commit -a -m "added x packages, changed x version"
git push
# go to github to create PR, or use github cli https://cli.github.com
- compatible with Pangeo BinderHubs and JupyterHubs
- compatible with Repo2Docker Python configuration files
- reproducible build process and explicit conda package lists
- small size, fast build
- easy to customize
Everything stems from the Dockerfile in the base-image folder. The base-image configures default settings for Conda and Dask with condarc.yml and dask_config.yml files. The base-image is not meant to run on its own, it is the common foundation for -notebook images that install Python packages including JupyerLab and lab extensions. Lists of Conda packages for each image are specified in an environment.yml in each -notebook folder, and compatible Dask and Jupyter packages are guaranteed by specifying the pangeo-notebook conda metapackage.
You can pre-solve for compatible environments locally with conda-lock to convert the environment.yml file to a conda-linux-64.lock file which is an explicit list of compatible packages solved by Conda. The major advantage of doing this is that if you rebuild at a later date the resulting Conda environment is identical, which improves reproducibility. For this reason, when building off of the base-image, any existing conda-linux-64.lock file takes precedence over the environment.yml file.
The runtime environment sets two variables by default
$PANGEO_ENV: name of the conda environment.$PANGEO_SCRATCH: a URL likegcs://pangeo-scratch/username/that points to a cloud storage bucket for temporary storage. This is set if the variable$PANGEO_SCRATCH_PREFIXandJUPYTERHUB_USERare detected. The prefix should be likes3://pangeo-scratch. This is not present in theforge/image.
- Since 2020.10.16, mamba is installed into the base-image and conda-lock environment and is used by default to solve for a compatible environment (see #146)
- For a simple list of packages for a given image, you can use a link like this: https://github.com/pangeo-data/pangeo-docker-images/blob/2020.10.08/pangeo-notebook/packages.txt
- To compare changes between two images, you can use a link like this: https://github.com/pangeo-data/pangeo-docker-images/compare/2020.10.03..2020.10.08
- Our
ml-notebookimage now contains JAX and TensorFlow with XLA enabled. Due to licensing issues, conda-forge does not haveptxas, butptxasis needed for XLA to work correctly. Should you like to use JAX and/or TensorFlow with XLA optimization, please installptxason your own, for example, byconda install -c nvidia cuda-nvcc. At the time of writing (October 2022), JAX throws a compilation error if theptxasversion is higher than the driver version. There does not exist an easy solution for K80 GPUs, but in the case of T4 GPUs, you should installconda install -c nvidia cuda-nvcc==11.6.*to be safe. Alternatively for any GPU, you could set an environment variable to resolve the error caused by JAX:XLA_FLAGS="--xla_gpu_force_compilation_parallelism=1". The aforementioned error will be removed (and likely turned into a warning) in a future version of JAX. See jax-ml/jax#12776 (comment)
The primary use of these Docker images is running on Pangeo Cloud deployments with dask-gateway. Generally, the dask-gateway library version built into the image must match the dask-gateway version deployed in the cloud environment. The follow table keeps track of the first time a new dask-gateway version appears in a tagged image:
| dask-gateway | Image tag |
|---|---|
| 0.9 | 2020.11.06 |
| 0.8 | 2020.07.28 |
| 0.7 | 2020.04.22 |