Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 72 additions & 42 deletions satpy/scene.py
Original file line number Diff line number Diff line change
Expand Up @@ -867,45 +867,68 @@

If data reduction is enabled, some local caching is perfomed in order to
avoid recomputation of area intersections.
"""
from satpy.resample.base import resample_dataset

new_datasets = {}
datasets = list(new_scn._datasets.values())
destination_area = self._get_finalized_destination_area(destination_area, new_scn)

resamplers = {}
reductions = {}
for dataset, parent_dataset in dataset_walker(datasets):
ds_id = DataID.from_dataarray(dataset)
pres = None
if parent_dataset is not None:
pres = new_datasets[DataID.from_dataarray(parent_dataset)]
if ds_id in new_datasets:
replace_anc(new_datasets[ds_id], pres)
if ds_id in new_scn._datasets:
new_scn._datasets[ds_id] = new_datasets[ds_id]
pres = self._get_new_datasets_from_parent(new_datasets, parent_dataset)
if self._replace_anc_for_new_datasets(new_datasets, ds_id, pres, new_scn):
continue
if dataset.attrs.get("area") is None:
if parent_dataset is None:
new_scn._datasets[ds_id] = dataset
else:
replace_anc(dataset, pres)
if self._update_area(dataset, parent_dataset, new_scn, ds_id, pres):
continue
LOG.debug("Resampling %s", ds_id)
source_area = dataset.attrs["area"]
dataset, source_area = self._reduce_data(dataset, source_area, destination_area,

dataset, source_area = self._reduce_data(dataset, destination_area,
reduce_data, reductions, resample_kwargs)
self._prepare_resampler(source_area, destination_area, resamplers, resample_kwargs)
kwargs = resample_kwargs.copy()
kwargs["resampler"] = resamplers[source_area]
res = resample_dataset(dataset, destination_area, **kwargs)

LOG.debug("Resampling %s", ds_id)
res = self._resample_dataset(source_area, destination_area, dataset, resamplers, resample_kwargs)

Check notice on line 890 in satpy/scene.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Complex Method

Scene._resampled_scene is no longer above the threshold for cyclomatic complexity. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.

Check notice on line 890 in satpy/scene.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Bumpy Road Ahead

Scene._resampled_scene is no longer above the threshold for logical blocks with deeply nested code. The Bumpy Road code smell is a function that contains multiple chunks of nested conditional logic. The deeper the nesting and the more bumps, the lower the code health.
new_datasets[ds_id] = res
if ds_id in new_scn._datasets:
new_scn._datasets[ds_id] = res
if parent_dataset is not None:
replace_anc(res, pres)

@classmethod
def _get_new_datasets_from_parent(self, new_datasets, parent_dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self should be cls since these are classmethods now, but given that they are classmethods or could be staticmethods, how about we move these to outside of the Scene? And could they (or should they) be moved to the satpy.resample subpackage?

if parent_dataset is not None:
return new_datasets[DataID.from_dataarray(parent_dataset)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataID.from_dataarray returns a single DataID, right? I think I'd prefer a different name for this method. I think the purpose of this chunk of code is to say "if we've resampled the parent already, use the resampled parent", right? Or rather, if the current dataset has a parent, it should have been resampled already, so we should use the resampled version of the parent. I think?

In addition to renaming, it seems that parent_dataset is only used in the later steps to check for is None. I'm wondering if we can remove the use of parent_dataset in favor of pres and "bundle" this methods operation with the dataset_walker to be something like:

for ds_id, dataset, resampled_parent in resampled_dataset_walker(datasets, new_datasets):

Or something like that.

...and if that is done, then there might be an argument for putting _replace_anc_for_new_datasets and _update_area into the for loop generator too. This changes the purpose of the for loop to be "what datasets do we need to resample" and then the inside of the for loop logic is just "reduce data", "resample data", "store result".

I'll admit the code was ugly and the logic of new_scn._datasets and new_datasets really isn't helping that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the comments and no regular review. I just keep brainstorming.

One other idea, what if only new_datasets gets modified in the for loop and then assigning to new_scn._datasets is left for a second for loop (ex. for ds_id, new_data_arr in new_datasets.items():)? Maybe that would clean up the code inside the loop. I feel like a lot of this ugliness is caused by new_scn (or rather the DatasetDict inside) copying the DataArray and/or making modifications to it.

return None

@classmethod
def _replace_anc_for_new_datasets(self, new_datasets, ds_id, pres, new_scn):
if ds_id in new_datasets:
replace_anc(new_datasets[ds_id], pres)
if ds_id in new_scn._datasets:
new_scn._datasets[ds_id] = new_datasets[ds_id]
return True
return False

@classmethod
def _update_area(self, dataset, parent_dataset, new_scn, ds_id, pres):
if dataset.attrs.get("area") is None:
if parent_dataset is None:
new_scn._datasets[ds_id] = dataset
else:
replace_anc(dataset, pres)
return True
return False

def _resample_dataset(self, source_area, destination_area, dataset, resamplers, resample_kwargs):
from satpy.resample.base import resample_dataset

self._prepare_resampler(source_area, destination_area, resamplers, resample_kwargs)
kwargs = resample_kwargs.copy()
kwargs["resampler"] = resamplers[source_area]
res = resample_dataset(dataset, destination_area, **kwargs)

return res

def _get_finalized_destination_area(self, destination_area, new_scn):
if isinstance(destination_area, str):
destination_area = get_area_def(destination_area)
Expand All @@ -927,32 +950,39 @@
resamplers[source_area] = resampler
self._resamplers[key] = resampler

def _reduce_data(self, dataset, source_area, destination_area, reduce_data, reductions, resample_kwargs):
def _reduce_data(self, dataset, destination_area, reduce_data, reductions, resample_kwargs):
source_area = dataset.attrs["area"]
if not reduce_data:
LOG.debug("Data reduction disabled by the user")
return dataset, source_area

try:
if reduce_data:
key = source_area
try:
(slice_x, slice_y), source_area = reductions[key]
except KeyError:
if resample_kwargs.get("resampler") == "gradient_search":
factor = resample_kwargs.get("shape_divisible_by", 2)
else:
factor = None
try:
slice_x, slice_y = source_area.get_area_slices(
destination_area, shape_divisible_by=factor)
except TypeError:
slice_x, slice_y = source_area.get_area_slices(
destination_area)
source_area = source_area[slice_y, slice_x]
reductions[key] = (slice_x, slice_y), source_area
dataset = self._slice_data(source_area, (slice_x, slice_y), dataset)
else:
LOG.debug("Data reduction disabled by the user")
slice_x, slice_y = self._get_source_dest_slices(source_area, destination_area, reductions, resample_kwargs)
source_area = source_area[slice_y, slice_x]
reductions[source_area] = (slice_x, slice_y), source_area
dataset = self._slice_data(source_area, (slice_x, slice_y), dataset)
Comment on lines 959 to +963
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if the _get_source_dest_slices operation is the only step that raises NotImplementedError or if _slice_data also does it? If the former, then maybe _slice_data should be moved outside of the try/except. Thoughts?

except NotImplementedError:
LOG.info("Not reducing data before resampling.")

Check notice on line 966 in satpy/scene.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ Getting better: Excess Number of Function Arguments

Scene._reduce_data decreases from 6 to 5 arguments, threshold = 4. This function has too many arguments, indicating a lack of encapsulation. Avoid adding more arguments.
return dataset, source_area

@classmethod
def _get_source_dest_slices(self, source_area, destination_area, reductions, resample_kwargs):
try:
(slice_x, slice_y), source_area = reductions[source_area]
except KeyError:
if resample_kwargs.get("resampler") == "gradient_search":
factor = resample_kwargs.get("shape_divisible_by", 2)
else:
factor = None
try:
slice_x, slice_y = source_area.get_area_slices(
destination_area, shape_divisible_by=factor)
except TypeError:
slice_x, slice_y = source_area.get_area_slices(
destination_area)
return slice_x, slice_y

def resample(
self,
destination: AreaDefinition | CoordinateDefinition | None = None,
Expand Down
Loading