map_over_datasets: skip empty nodes #10042

mathause · 2025-02-10T17:36:33Z

Closes map_over_datasets throws error on nodes without datasets #9693
Closes datatree gets dis-aligned in binary op #10013
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

misses tests and docs but I'd like to get some feedback first
needs some add. logic to only check the output on non-empty nodes and to ensure multi-output functions are correct
no good way for a proper deprecation without a keyword

Illviljan · 2025-02-16T09:44:57Z

A interpolation use case that doesn't crash with this PR:

import numpy as np

import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables = 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds
dt = xr.DataTree.from_dict(datasets)

# %% Interpolate to same time coordinate
def ds_interp(ds, *args, **kwargs):
    return ds.interp(*args, **kwargs)


new_time = np.linspace(0, 100, 50)
dt_interp = dt.map_over_datasets(
    ds_interp, kwargs=dict(time=new_time, assume_sorted=True)
)

mathause · 2025-02-17T05:52:37Z

Thanks for the example. This PR would also close #10013. This would be a huge plus for me. Not being able to subtract a ds from a datatree makes it extremely cumbersome. However, this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

mathause · 2025-02-27T17:34:10Z

@TomNicholas do you see any chance this PR might get merged (after adding tests etc. obviously)? Are there discussions beside #9693 that I am missing?

TomNicholas · 2025-02-27T18:20:47Z

Hey @mathause - sorry for forgetting about this - I've been busy.

I think something like this should get merged, but there are various small and fairly arbitrary choices to quibble over. They are basically all already mentioned in #9693 though.

this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

I don't understand this statement though - aren't binary ops already implemented using map_over_datasets?

xarray/xarray/core/datatree.py

Line 1590 in 5ea1e81

return map_over_datasets(ds_binop, self, other)

We're changing the behaviour, but changing it to be closer to the old datatree, which is what a lot of users expect anyway.

mathause · 2025-03-07T05:00:45Z

I think this is ready for review

Hey @mathause - sorry for forgetting about this - I've been busy.

No worries! Thanks for considering this PR!

I think something like this should get merged, but there are various small and fairly arbitrary choices to quibble over. They are basically all already mentioned in #9693 though.

The one unclear choice from #9693 was the comment by @shoyer #9693 (comment):

I'm not sure whether or not to call the mapped over function for nodes that only define coordinates. Certainly I would not blindly copy coordinates from otherwise empty nodes onto the result, because those coordinates may no longer be relevant on the result.

I currently use DataTree.has_data, this includes nodes that only have coords (although nodes which inherit coords are excluded (I think)). I don't see a way to be clever about these nodes.

this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

I don't understand this statement though - aren't binary ops already implemented using map_over_datasets?

Yes sorry that was not clear. I just wanted to say that binary ops are also affected.

mathause · 2025-03-13T09:59:05Z

gentle ping @TomNicholas

(apologies for bothering you again - I am unfortunately currently blocked by this in another project. Or is there someone else who could potentially review this?)

TomNicholas · 2025-03-14T16:01:48Z

no worries, replied here #9693 (comment)

peanutfun · 2025-08-27T11:04:10Z

Any updates on this? #10013 makes binary operations on datatrees quite unfeasible, and a fix would be very welcome!

shoyer · 2025-11-21T17:36:19Z

xarray/core/datatree_mapping.py

-                node_dataset_args.insert(i, arg)
-        with add_path_context_to_errors(path):
-            results = func(*node_dataset_args, **kwargs)
+        if node_tree_args[0].has_data:


I would check every argument here, instead of just the first one.

shoyer · 2025-11-21T17:46:38Z

xarray/core/datatree_mapping.py

+        elif node_tree_args[0].has_attrs:
+            # propagate attrs
+            results = node_tree_args[0].dataset
+            func_called[path] = False
+
+        else:
+            # use Dataset instead of None to ensure it has copy method
+            results = Dataset()
+            func_called[path] = False


Assigning a single dataset as the result implicitly assumes that the result of map_over_datasets is a single Dataset, which may not be the case.

In cases of at least one non-empty node, we can infer the number of outputs, but this isn't possible in general, e.g., consider map_over_datasets(np.divmod, Datatree(), Datatree()).

I think we now need an explicit num_return_values argument which can be supplied to disambiguate this.

mathause · 2025-11-28T13:47:54Z

Thanks a lot for the review! I updated the function to check all passed datatree nodes and added default_num_return_values - happy to get another review!

map_over_datasets: skip empty nodes

0104039

mathause marked this pull request as draft February 10, 2025 17:37

mathause mentioned this pull request Feb 10, 2025

compatibility with xr.DataTree MESMER-group/mesmer#607

Merged

3 tasks

fix typing

a23fb44

mathause mentioned this pull request Feb 10, 2025

map_over_datasets throws error on nodes without datasets #9693

Open

Merge branch 'main' into map_over_datasets_skip_empty_nodes

9b1755e

Merge branch 'main' into map_over_datasets_skip_empty_nodes

8b6a816

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Feb 27, 2025

mathause added 6 commits March 3, 2025 09:30

Merge branch 'main' into map_over_datasets_skip_empty_nodes

2553a8e

changelog

f787a76

update docstring & comments

f2a924d

Merge branch 'main' into map_over_datasets_skip_empty_nodes

a3b735a

more comments

82de573

tests

0e34067

mathause closed this Mar 6, 2025

mathause reopened this Mar 6, 2025

mathause added 4 commits March 7, 2025 04:55

remove unnecessary test

bbfedf2

add binary op test

4ba1d27

mention binary ops

6e18bd7

clean test

65181a4

mathause marked this pull request as ready for review March 7, 2025 04:42

Merge branch 'main' into map_over_datasets_skip_empty_nodes

492379e

aulemahal mentioned this pull request May 13, 2025

Support DataTree Ouranosinc/xclim#2144

Merged

5 tasks

This was referenced Jun 5, 2025

Predictor Arrays in one Dataset per scenario MESMER-group/mesmer#677

Merged

move map_over_datasets to datatree module MESMER-group/mesmer#700

Merged

mathause added 2 commits June 12, 2025 16:15

Merge branch 'main' into map_over_datasets_skip_empty_nodes

97a2c1b

move changelog entry

d4c8bdc

github-actions bot added the topic-documentation label Jun 12, 2025

dcherian removed the topic-documentation label Jul 11, 2025

mathause added 4 commits September 10, 2025 11:35

Merge branch 'main' into map_over_datasets_skip_empty_nodes

6895592

Merge branch 'main' into map_over_datasets_skip_empty_nodes

a227b8b

Merge branch 'main' into map_over_datasets_skip_empty_nodes

6621ff0

Merge branch 'main' into map_over_datasets_skip_empty_nodes

dc52de0

dcherian requested a review from shoyer November 12, 2025 18:10

shoyer reviewed Nov 21, 2025

View reviewed changes

mathause added 7 commits November 28, 2025 12:43

Merge branch 'main' into map_over_datasets_skip_empty_nodes

fac477c

update whats new

85f4e2b

check if any dt.has_data

52612f9

add default_num_return_values

0cc9633

Merge branch 'main' into map_over_datasets_skip_empty_nodes

e48be5b

fix type: ignore

8f90906

update docstring

0abbde1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

map_over_datasets: skip empty nodes #10042

map_over_datasets: skip empty nodes #10042

mathause commented Feb 10, 2025 •

edited

Loading

Uh oh!

Illviljan commented Feb 16, 2025

Uh oh!

mathause commented Feb 17, 2025

Uh oh!

mathause commented Feb 27, 2025

Uh oh!

TomNicholas commented Feb 27, 2025

Uh oh!

mathause commented Mar 7, 2025

Uh oh!

mathause commented Mar 13, 2025

Uh oh!

TomNicholas commented Mar 14, 2025

Uh oh!

peanutfun commented Aug 27, 2025

Uh oh!

shoyer Nov 21, 2025

Uh oh!

shoyer Nov 21, 2025

Uh oh!

mathause commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

map_over_datasets: skip empty nodes #10042

Are you sure you want to change the base?

map_over_datasets: skip empty nodes #10042

Conversation

mathause commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Illviljan commented Feb 16, 2025

Uh oh!

mathause commented Feb 17, 2025

Uh oh!

mathause commented Feb 27, 2025

Uh oh!

TomNicholas commented Feb 27, 2025

Uh oh!

mathause commented Mar 7, 2025

Uh oh!

mathause commented Mar 13, 2025

Uh oh!

TomNicholas commented Mar 14, 2025

Uh oh!

peanutfun commented Aug 27, 2025

Uh oh!

shoyer Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

mathause commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mathause commented Feb 10, 2025 •

edited

Loading