Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/source/mldatasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
xcube Multi-Resolution Datasets
===============================

Definition
----------

A xcube _multi-resolution dataset_ refers to an N-D [image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A xcube _multi-resolution dataset_ refers to an N-D [image
A xcube _multi-resolution dataset_ uses an N-D [image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not correct.

pyramid](https://en.wikipedia.org/wiki/Pyramid_(image_processing))
where an _image_ refers to a 2-D dataset with two spatial dimensions
in some horizontal coordinate system.

A multi-resolution dataset comprises a fixed number of
_levels_, which are regular datasets covering the same spatial area at different resolutions.
Level zero represents the original resolution `res(L=0)`, higher level
resolutions decrease by a factor of two: `res(L) = res(0) / 2^L`.


Implementation in xcube
-----------------------

In xcube, multi-resolution datasets are represented by the abstract class
`xcube.core.mldataset.MultiLevelDataset`. The xcube data store framework
refers to this datatype using the alias `mldataset`. The corresponding
default data format is the xcube `levels` format. Later xcube will also
support Cloud Optimized GeoTIFF (COG) as format for multi-resolution
datasets.

The xcube Levels Format
-----------------------

The xcube Levels format is basically a single top-level directory.
The filename extension of that directory should be `.levels`
by convention. The directory entries are Zarr datasets

1. that are representations of regular xarray datasets named after
their zero-based level index, `{level}.zarr`;
2. that comply with the xcube dataset convention.

<div style="color: red;">
TODO (forman): link to xcube dataset convention
</div>


The following is a multi-resolution dataset with three levels:

- test_pyramid.levels/
- 0.zarr/
- 1.zarr/
- 2.zarr/

An important use case is generating image pyramids from existing large
datasets without the need to create a copy of level zero.

To support this, the level zero dataset may be a link to an existing
Zarr dataset. The filename is then `0.link` rather than `0.zarr`.
The link file contains the path to the actual Zarr dataset
to be used as level zero as a plain text string. It may be an absolute
path or a path relative to the top-level dataset.

- test_pyramid.levels/
- 0.link/ # --> link to actual level zero dataset
- 1.zarr/
- 2.zarr/

Related reads
-------------

* [WIP: Multiscale use-case](https://github.com/zarr-developers/zarr-specs/issues/23)
in zarr-developers / zarr-specs on GitHub


To be discussed
---------------

* Allow links for all levels?
* Add top-level metadata such as `num_levels` and links for each
level?
* Make top-level directory a Zarr group (`.zgroup`)
and encode level metadata in `.zattrs` (e.g. `num_levels`)?
* Link relative to link file?

To do
-----

* Currently, the FS data stores treat relative link paths as relative
to the data store's `root`.