Skip to content

Conversation

@konstntokas
Copy link
Contributor

@konstntokas konstntokas commented Dec 18, 2025

  • Use chunks="auto" by default in rioxarray.open_rasterio within
    xcube.core.store.fs.impl.rasterio when reading GeoTIFF and JPEG2000 files.
    This enables efficient, storage-aware data access without forcing explicit
    rechunking.
  • Added the optional argument band_as_variable, which allows to preserve the
    original dataset structure as returned by rioxarray, rather than splitting
    raster bands into separate data variables. This improves data access patterns and
    avoids unnecessary transformations. Defaults are set to False.

Note: I needed to update the stac jsons, because the htmlRepr of the datasets have changed (probably due to an update in xarray).

Here is an small example of a GeoTIFF access as it was implemented before:
Screenshot from 2025-12-18 10-12-03

Here is an small example of the same GeoTIFF access with chunks="auto":
Screenshot from 2025-12-18 10-11-54

Checklist:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/source/*
  • Changes documented in CHANGES.md
  • GitHub CI passes
  • AppVeyor CI passes
  • Test coverage remains or increases (target 100%)

@konstntokas
Copy link
Contributor Author

Example Notebooks showing local data access with different opening parameters

access_tif_local0.ipynb
access_tif_local1.ipynb
access_tif_local2.ipynb

@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.55%. Comparing base (3029575) to head (fac81aa).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1192   +/-   ##
=======================================
  Coverage   89.55%   89.55%           
=======================================
  Files         280      280           
  Lines       21646    21653    +7     
=======================================
+ Hits        19384    19391    +7     
  Misses       2262     2262           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@b-yogesh b-yogesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did not test it but I believe the screenshots on the speedup. Should we add any tests that show that using chunks is faster than any other tile size? Also, please have a look at my comments.

"variables. If `False`, the original data structure returned by "
"rioxarray is preserved."
),
default=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it says the defaults are set to True but in the PR description it says it is set to False.

"variables. If `False`, the original data structure returned by "
"rioxarray is preserved."
),
default=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

file_path: str,
overview_level: int,
tile_size: tuple[int, int] | None = None,
band_as_variable: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well.

self._file_url,
overview_level=index - 1 if index > 0 else None,
tile_size=self._open_params.get("tile_size"),
band_as_variable=self._open_params.get("band_as_variable", True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and other places too. If true is correct, change the PR description

* Added the optional argument `band_as_variable`, which allows to preserve the
original dataset structure as returned by `rioxarray`, rather than splitting
raster bands into separate data variables. This improves data access patterns and
avoids unnecessary transformations. Defaults are set to `False`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is set to False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants