From 4ec176f7b268995553bcc92b3e6eda38ad53eb84 Mon Sep 17 00:00:00 2001 From: Norman Fomferra Date: Thu, 3 Mar 2022 10:02:02 +0100 Subject: [PATCH 1/3] New spec for xcube Multi-Resolution Datasets --- docs/source/mldatasets.md | 87 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 docs/source/mldatasets.md diff --git a/docs/source/mldatasets.md b/docs/source/mldatasets.md new file mode 100644 index 000000000..6246003e3 --- /dev/null +++ b/docs/source/mldatasets.md @@ -0,0 +1,87 @@ +xcube Multi-Resolution Datasets +=============================== + +Definition +---------- + +A xcube _multi-resolution dataset_ refers to an N-D [image +pyramid](https://en.wikipedia.org/wiki/Pyramid_(image_processing)) +where an _image_ refers to a 2-D dataset with two spatial dimensions +in some horizontal coordinate system. + +A multi-resolution dataset comprises a fixed number of +_levels_, which are regular datasets at different spatial resolutions. +Level zero represents the original resolution `res(L=0)`, higher level +resolutions decrease by a factor of two: `res(L) = res(0) / 2^L`. + + +Implementation in xcube +----------------------- + +In xcube, multi-resolution datasets are represented by the abstract class +`xcube.core.mldataset.MultiLevelDataset`. The xcube data store framework +refers to this datatype using the alias `mldataset`. The corresponding +default data format is the xcube `levels` format. Later xcube will also +support Cloud Optimized GeoTIFF (COG) as format for multi-resolution +datasets. + +The xcube Levels Format +----------------------- + +The xcube Levels format is basically a single top-level directory +The filename extension of that directory should be `.levels` +by convention. The directory entries are Zarr datasets + +1. that are representations of regular xarray datasets named after + their zero-based level index, `{level}.zarr`; +2. that comply with the xcube dataset convention. + +
+ TODO (forman): link to xcube dataset convention +
+ + +The following is a multi-resolution dataset with three levels: + + - test_pyramid.levels/ + - 0.zarr/ + - 1.zarr/ + - 2.zarr/ + +An important use case is generating image pyramids from existing large +datasets without the need to create a copy of level zero. + +To support this, the level zero dataset may be a link to an existing +Zarr dataset. The filename is then `0.link` rather than `0.zarr`. +The link file contains the path to the actual Zarr dataset +to be used as level zero as a plain text string. It may be an absolute +path or a path relative to the top-level dataset. + + - test_pyramid.levels/ + - 0.link/ # --> link to actual level zero dataset + - 1.zarr/ + - 2.zarr/ + +Related reads +------------- + +* [WIP: Multiscale use-case](https://github.com/zarr-developers/zarr-specs/issues/23) + in zarr-developers / zarr-specs on GitHub + + +To be discussed +--------------- + +* Allow links for all levels? +* Add top-level metadata such as `num_levels` and links for each + level? +* Make top-level directory a Zarr group (`.zgroup`) + and encode level metadata in `.zattrs` (e.g. `num_levels`)? +* Link relative to link file? + +To do +----- + +* Currently, the FS data stores treat relative link paths as relative + to the data store's `root`. + From 664d0b9ccc8064d53176c99ec9942bcadc669162 Mon Sep 17 00:00:00 2001 From: Norman Fomferra Date: Thu, 3 Mar 2022 10:58:41 +0100 Subject: [PATCH 2/3] Update docs/source/mldatasets.md Co-authored-by: Tonio Fincke --- docs/source/mldatasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/mldatasets.md b/docs/source/mldatasets.md index 6246003e3..ef34fb3dd 100644 --- a/docs/source/mldatasets.md +++ b/docs/source/mldatasets.md @@ -10,7 +10,7 @@ where an _image_ refers to a 2-D dataset with two spatial dimensions in some horizontal coordinate system. A multi-resolution dataset comprises a fixed number of -_levels_, which are regular datasets at different spatial resolutions. +_levels_, which are regular datasets covering the same spatial area at different resolutions. Level zero represents the original resolution `res(L=0)`, higher level resolutions decrease by a factor of two: `res(L) = res(0) / 2^L`. From edb15acee14fbe4b017dc94568d28e6e81a9d1b4 Mon Sep 17 00:00:00 2001 From: Norman Fomferra Date: Thu, 3 Mar 2022 10:58:48 +0100 Subject: [PATCH 3/3] Update docs/source/mldatasets.md Co-authored-by: Tonio Fincke --- docs/source/mldatasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/mldatasets.md b/docs/source/mldatasets.md index ef34fb3dd..360153727 100644 --- a/docs/source/mldatasets.md +++ b/docs/source/mldatasets.md @@ -28,7 +28,7 @@ datasets. The xcube Levels Format ----------------------- -The xcube Levels format is basically a single top-level directory +The xcube Levels format is basically a single top-level directory. The filename extension of that directory should be `.levels` by convention. The directory entries are Zarr datasets