-
Notifications
You must be signed in to change notification settings - Fork 4
Factor out common HRRR RegionJob and TemplateConfig #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the HRRR (High-Resolution Rapid Refresh) codebase by extracting common functionality into base classes to support future HRRR analysis datasets. The refactoring creates shared NoaaHrrrTemplateConfig and NoaaHrrrRegionJob base classes that contain spatial configuration, data variable definitions, and file processing logic previously specific to the 48-hour forecast implementation.
Key Changes:
- Introduced base classes
NoaaHrrrTemplateConfigandNoaaHrrrRegionJobcontaining common HRRR functionality - Refactored
NoaaHrrrForecast48HourTemplateConfigto inherit from the new base class, removing ~400 lines of duplicated code - Created
NoaaHrrrForecast48HourSourceFileCoordsubclass to handle forecast-specific output location mapping
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/reformatters/noaa/hrrr/template_config.py |
New base template config with shared spatial coordinates, data variables, and coordinate calculation methods |
src/reformatters/noaa/hrrr/region_job.py |
New base region job with common file download, data reading, and transformation logic |
src/reformatters/noaa/hrrr/forecast_48_hour/template_config.py |
Refactored to extend base class, removing duplicated coordinate and data variable definitions |
src/reformatters/noaa/hrrr/forecast_48_hour/region_job.py |
Simplified to extend base class, retaining only forecast-specific coordinate generation |
src/reformatters/noaa/hrrr/forecast_48_hour/dynamical_dataset.py |
Updated imports to use NoaaHrrrSourceFileCoord from base module |
tests/noaa/hrrr/template_config_test.py |
New test file for base template config functionality |
tests/noaa/hrrr/region_job_test.py |
New test file for base region job functionality |
tests/noaa/hrrr/forecast_48_hour/template_config_test.py |
Updated import to use base NoaaHrrrSourceFileCoord |
tests/noaa/hrrr/forecast_48_hour/region_job_test.py |
Removed tests moved to base, updated remaining tests to use NoaaHrrrForecast48HourSourceFileCoord |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I weighed having hrrr analysis classes just subclass their hrrr forecast counterparts.. that spreads logic over fewer places, this approach is probably more obvious to new readers. I could go either way |
| @computed_field # type: ignore[prop-decorator] | ||
| @property | ||
| def data_vars(self) -> Sequence[NoaaHrrrDataVar]: | ||
| # ~15.6MB uncompressed, ~3.1MB compressed | ||
| var_chunks: dict[Dim, int] = { | ||
| "init_time": 1, | ||
| "lead_time": 49, # all lead times | ||
| "x": 300, # 6 chunks (1799 pixels) | ||
| "y": 265, # 4 chunks (1059 pixels) | ||
| } | ||
|
|
||
| # Single shard for each init time | ||
| # ~374MB uncompressed, ~75MB compressed | ||
| var_shards: dict[Dim, int] = { | ||
| "init_time": 1, | ||
| "lead_time": 49, | ||
| "x": var_chunks["x"] * 6, # single shard | ||
| "y": var_chunks["y"] * 4, # single shard | ||
| } | ||
|
|
||
| encoding_float32_default = Encoding( | ||
| dtype="float32", | ||
| fill_value=0.0, | ||
| chunks=tuple(var_chunks[d] for d in self.dims), | ||
| shards=tuple(var_shards[d] for d in self.dims), | ||
| compressors=[BLOSC_4BYTE_ZSTD_LEVEL3_SHUFFLE], | ||
| ) | ||
|
|
||
| default_window_reset_frequency = pd.Timedelta("1h") | ||
| default_keep_mantissa_bits = 7 | ||
|
|
||
| return [ | ||
| NoaaHrrrDataVar( | ||
| name="composite_reflectivity", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="refc", | ||
| long_name="Composite reflectivity", | ||
| units="dBZ", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="REFC", | ||
| grib_description='0[-] EATM="Entire Atmosphere"', | ||
| index_position=1, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| grib_index_level="entire atmosphere", | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="temperature_2m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="t2m", | ||
| long_name="2 metre temperature", | ||
| units="C", | ||
| step_type="instant", | ||
| standard_name="air_temperature", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="TMP", | ||
| grib_description='2[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="2 m above ground", | ||
| index_position=71, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="wind_u_10m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="u10", | ||
| long_name="10 metre U wind component", | ||
| units="m/s", | ||
| step_type="instant", | ||
| standard_name="eastward_wind", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="UGRD", | ||
| grib_description='10[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="10 m above ground", | ||
| index_position=77, | ||
| keep_mantissa_bits=6, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="wind_v_10m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="v10", | ||
| long_name="10 metre V wind component", | ||
| units="m/s", | ||
| step_type="instant", | ||
| standard_name="northward_wind", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="VGRD", | ||
| grib_description='10[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="10 m above ground", | ||
| index_position=78, | ||
| keep_mantissa_bits=6, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="precipitation_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="tp", | ||
| long_name="Total Precipitation", | ||
| units="mm/s", | ||
| comment="Average precipitation rate since the previous forecast step.", | ||
| step_type="avg", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="APCP", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=84, | ||
| deaccumulate_to_rate=True, | ||
| window_reset_frequency=default_window_reset_frequency, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="precipitable_water_atmosphere", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="pwat", | ||
| long_name="Precipitable water", | ||
| units="kg/(m^2)", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="PWAT", | ||
| grib_description='0[-] EATM="Entire atmosphere (considered as a single layer)"', | ||
| grib_index_level="entire atmosphere (considered as a single layer)", | ||
| index_position=107, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="total_cloud_cover_atmosphere", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="tcc", | ||
| long_name="Total Cloud Cover", | ||
| units="%", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="TCDC", | ||
| grib_description='0[-] EATM="Entire Atmosphere"', | ||
| grib_index_level="entire atmosphere", | ||
| index_position=116, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="downward_short_wave_radiation_flux_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="sdswrf", | ||
| long_name="Surface downward short-wave radiation flux", | ||
| units="W/(m^2)", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="DSWRF", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=123, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="downward_long_wave_radiation_flux_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="sdlwrf", | ||
| long_name="Surface downward long-wave radiation flux", | ||
| units="W/(m^2)", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="DLWRF", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=124, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="pressure_reduced_to_mean_sea_level", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="prmsl", | ||
| long_name="Pressure reduced to MSL", | ||
| units="Pa", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="MSLMA", | ||
| grib_description='0[-] MSL="Mean sea level"', | ||
| grib_index_level="mean sea level", | ||
| index_position=41, | ||
| keep_mantissa_bits=13, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="percent_frozen_precipitation_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="cpofp", | ||
| long_name="Percent frozen precipitation", | ||
| units="%", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="CPOFP", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=82, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="pressure_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="sp", | ||
| long_name="Surface pressure", | ||
| units="Pa", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="PRES", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=62, | ||
| keep_mantissa_bits=10, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="categorical_ice_pellets_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="cicep", | ||
| long_name="Categorical ice pellets", | ||
| units="0=no; 1=yes", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="CICEP", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=91, | ||
| keep_mantissa_bits="no-rounding", | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="categorical_snow_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="csnow", | ||
| long_name="Categorical snow", | ||
| units="0=no; 1=yes", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="CSNOW", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=90, | ||
| keep_mantissa_bits="no-rounding", | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="categorical_freezing_rain_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="cfrzr", | ||
| long_name="Categorical freezing rain", | ||
| units="0=no; 1=yes", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="CFRZR", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=92, | ||
| keep_mantissa_bits="no-rounding", | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="categorical_rain_surface", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="crain", | ||
| long_name="Categorical rain", | ||
| units="0=no; 1=yes", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="CRAIN", | ||
| grib_description='0[-] SFC="Ground or water surface"', | ||
| grib_index_level="surface", | ||
| index_position=93, | ||
| keep_mantissa_bits="no-rounding", | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="relative_humidity_2m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="r2", | ||
| long_name="2 metre relative humidity", | ||
| units="%", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="RH", | ||
| grib_description='2[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="2 m above ground", | ||
| index_position=75, | ||
| keep_mantissa_bits=default_keep_mantissa_bits, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="geopotential_height_cloud_ceiling", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="gh", | ||
| long_name="Geopotential height", | ||
| units="gpm", | ||
| step_type="instant", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="HGT", | ||
| grib_description='0[-] CEIL="Cloud ceiling"', | ||
| grib_index_level="cloud ceiling", | ||
| index_position=117, | ||
| keep_mantissa_bits=8, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| # HRRR provides 80m but not 100m winds | ||
| NoaaHrrrDataVar( | ||
| name="wind_u_80m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="u80", | ||
| long_name="U-component of wind (80 m above ground)", | ||
| units="m/s", | ||
| step_type="instant", | ||
| standard_name="eastward_wind", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="UGRD", | ||
| grib_description='80[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="80 m above ground", | ||
| index_position=60, | ||
| keep_mantissa_bits=6, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| NoaaHrrrDataVar( | ||
| name="wind_v_80m", | ||
| encoding=encoding_float32_default, | ||
| attrs=DataVarAttrs( | ||
| short_name="v80", | ||
| long_name="V-component of wind (80 m above ground)", | ||
| units="m/s", | ||
| step_type="instant", | ||
| standard_name="northward_wind", | ||
| ), | ||
| internal_attrs=NoaaHrrrInternalAttrs( | ||
| grib_element="VGRD", | ||
| grib_description='80[m] HTGL="Specified height level above ground"', | ||
| grib_index_level="80 m above ground", | ||
| index_position=61, | ||
| keep_mantissa_bits=6, | ||
| hrrr_file_type="sfc", | ||
| ), | ||
| ), | ||
| ] | ||
|
|
||
| def _spatial_info( | ||
| self, | ||
| ) -> tuple[ | ||
| tuple[int, int], tuple[float, float, float, float], tuple[float, float], str | ||
| ]: | ||
| """ | ||
| Returns (shape, bounds, resolution, crs proj4 string). | ||
| Useful for deriving x, y and latitude, longitude coordinates. | ||
| See tests/noaa/hrrr/forecast_48_hour/template_config_test.py::test_spatial_info_matches_file | ||
| """ | ||
| return ( | ||
| (1059, 1799), | ||
| ( | ||
| -2699020.142521929, | ||
| -1588806.152556665, | ||
| 2697979.857478071, | ||
| 1588193.847443335, | ||
| ), | ||
| (3000.0, -3000.0), | ||
| "+proj=lcc +lat_0=38.5 +lon_0=-97.5 +lat_1=38.5 +lat_2=38.5 +x_0=0 +y_0=0 +R=6371229 +units=m +no_defs=True", | ||
| ) | ||
| return self.get_data_vars(encoding_float32_default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if var_chunks, var_shards and var_encoding should be defined as properties or methods on the TemplateConfig. That would allow you to avoid needing to know to call this self.get_data_vars and pass things through, since the parent data_vars could reference self.var_encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm i do like how that has a more clear place to define encodings and avoids this get_data_vars function. the downside is that makes it awkwards to have different encodings per variable (which is a pretty reasonable thing to do, eg. categorical vars, or larger chunking for really compressible variables.) Maybe there's some way around that..
| ) | ||
|
|
||
|
|
||
| class NoaaHrrrTemplateConfig(TemplateConfig[NoaaHrrrDataVar]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of nit since it's just a naming thing, but I wonder if it would be clearer to name this "base" class NoaaHrrrTemplateConfigMixin. I think that probably more accurately captures its use as a utility class that cannot be used standalone (required fields are not fully defined), inheritance still works, etc.
| dim_coords = self.dimension_coordinates() | ||
| append_dim_coordinate_chunk_size = self.append_dim_coordinate_chunk_size() | ||
|
|
||
| spatial_coords = super().coords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were going the Mixin route, maybe I'd call self.spatial_coords rather than trying to reason about why super().coords only has spatial coordinates.
I think I generally prefer the approach you ended up with as being more intuitive (both inheriting from some shared common HRRR configuration), though maybe if there is some conceptually sensible relationship (where HRRR Analysis is seen as a variation on Forecast) directly inheriting from Forecast might make sense. I do agree with the potential upside (logic remains consolidated) and that one doesn't really have to reason about what the common/shared/base thing(s) represent. |
Lay the groundwork for HRRR analysis.
No new functionality introduced in this PR.