Skip to content

Conversation

@aldenks
Copy link
Member

@aldenks aldenks commented Nov 14, 2025

Lay the groundwork for HRRR analysis.

No new functionality introduced in this PR.

  • template for hrrr 48 hour has no changes as expected
  • hrrr 48 hour dataset integration tests continue to pass with no changes as expected

@aldenks aldenks requested a review from Copilot November 14, 2025 05:30
Copilot finished reviewing on behalf of aldenks November 14, 2025 05:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the HRRR (High-Resolution Rapid Refresh) codebase by extracting common functionality into base classes to support future HRRR analysis datasets. The refactoring creates shared NoaaHrrrTemplateConfig and NoaaHrrrRegionJob base classes that contain spatial configuration, data variable definitions, and file processing logic previously specific to the 48-hour forecast implementation.

Key Changes:

  • Introduced base classes NoaaHrrrTemplateConfig and NoaaHrrrRegionJob containing common HRRR functionality
  • Refactored NoaaHrrrForecast48HourTemplateConfig to inherit from the new base class, removing ~400 lines of duplicated code
  • Created NoaaHrrrForecast48HourSourceFileCoord subclass to handle forecast-specific output location mapping

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/reformatters/noaa/hrrr/template_config.py New base template config with shared spatial coordinates, data variables, and coordinate calculation methods
src/reformatters/noaa/hrrr/region_job.py New base region job with common file download, data reading, and transformation logic
src/reformatters/noaa/hrrr/forecast_48_hour/template_config.py Refactored to extend base class, removing duplicated coordinate and data variable definitions
src/reformatters/noaa/hrrr/forecast_48_hour/region_job.py Simplified to extend base class, retaining only forecast-specific coordinate generation
src/reformatters/noaa/hrrr/forecast_48_hour/dynamical_dataset.py Updated imports to use NoaaHrrrSourceFileCoord from base module
tests/noaa/hrrr/template_config_test.py New test file for base template config functionality
tests/noaa/hrrr/region_job_test.py New test file for base region job functionality
tests/noaa/hrrr/forecast_48_hour/template_config_test.py Updated import to use base NoaaHrrrSourceFileCoord
tests/noaa/hrrr/forecast_48_hour/region_job_test.py Removed tests moved to base, updated remaining tests to use NoaaHrrrForecast48HourSourceFileCoord

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aldenks aldenks marked this pull request as ready for review November 14, 2025 12:55
@aldenks aldenks requested a review from mosegontar November 14, 2025 12:55
@aldenks
Copy link
Member Author

aldenks commented Nov 14, 2025

I weighed having hrrr analysis classes just subclass their hrrr forecast counterparts.. that spreads logic over fewer places, this approach is probably more obvious to new readers. I could go either way

Comment on lines 204 to 232
@computed_field # type: ignore[prop-decorator]
@property
def data_vars(self) -> Sequence[NoaaHrrrDataVar]:
# ~15.6MB uncompressed, ~3.1MB compressed
var_chunks: dict[Dim, int] = {
"init_time": 1,
"lead_time": 49, # all lead times
"x": 300, # 6 chunks (1799 pixels)
"y": 265, # 4 chunks (1059 pixels)
}

# Single shard for each init time
# ~374MB uncompressed, ~75MB compressed
var_shards: dict[Dim, int] = {
"init_time": 1,
"lead_time": 49,
"x": var_chunks["x"] * 6, # single shard
"y": var_chunks["y"] * 4, # single shard
}

encoding_float32_default = Encoding(
dtype="float32",
fill_value=0.0,
chunks=tuple(var_chunks[d] for d in self.dims),
shards=tuple(var_shards[d] for d in self.dims),
compressors=[BLOSC_4BYTE_ZSTD_LEVEL3_SHUFFLE],
)

default_window_reset_frequency = pd.Timedelta("1h")
default_keep_mantissa_bits = 7

return [
NoaaHrrrDataVar(
name="composite_reflectivity",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="refc",
long_name="Composite reflectivity",
units="dBZ",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="REFC",
grib_description='0[-] EATM="Entire Atmosphere"',
index_position=1,
keep_mantissa_bits=default_keep_mantissa_bits,
grib_index_level="entire atmosphere",
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="temperature_2m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="t2m",
long_name="2 metre temperature",
units="C",
step_type="instant",
standard_name="air_temperature",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="TMP",
grib_description='2[m] HTGL="Specified height level above ground"',
grib_index_level="2 m above ground",
index_position=71,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="wind_u_10m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="u10",
long_name="10 metre U wind component",
units="m/s",
step_type="instant",
standard_name="eastward_wind",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="UGRD",
grib_description='10[m] HTGL="Specified height level above ground"',
grib_index_level="10 m above ground",
index_position=77,
keep_mantissa_bits=6,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="wind_v_10m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="v10",
long_name="10 metre V wind component",
units="m/s",
step_type="instant",
standard_name="northward_wind",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="VGRD",
grib_description='10[m] HTGL="Specified height level above ground"',
grib_index_level="10 m above ground",
index_position=78,
keep_mantissa_bits=6,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="precipitation_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="tp",
long_name="Total Precipitation",
units="mm/s",
comment="Average precipitation rate since the previous forecast step.",
step_type="avg",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="APCP",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=84,
deaccumulate_to_rate=True,
window_reset_frequency=default_window_reset_frequency,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="precipitable_water_atmosphere",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="pwat",
long_name="Precipitable water",
units="kg/(m^2)",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="PWAT",
grib_description='0[-] EATM="Entire atmosphere (considered as a single layer)"',
grib_index_level="entire atmosphere (considered as a single layer)",
index_position=107,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="total_cloud_cover_atmosphere",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="tcc",
long_name="Total Cloud Cover",
units="%",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="TCDC",
grib_description='0[-] EATM="Entire Atmosphere"',
grib_index_level="entire atmosphere",
index_position=116,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="downward_short_wave_radiation_flux_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="sdswrf",
long_name="Surface downward short-wave radiation flux",
units="W/(m^2)",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="DSWRF",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=123,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="downward_long_wave_radiation_flux_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="sdlwrf",
long_name="Surface downward long-wave radiation flux",
units="W/(m^2)",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="DLWRF",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=124,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="pressure_reduced_to_mean_sea_level",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="prmsl",
long_name="Pressure reduced to MSL",
units="Pa",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="MSLMA",
grib_description='0[-] MSL="Mean sea level"',
grib_index_level="mean sea level",
index_position=41,
keep_mantissa_bits=13,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="percent_frozen_precipitation_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="cpofp",
long_name="Percent frozen precipitation",
units="%",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="CPOFP",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=82,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="pressure_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="sp",
long_name="Surface pressure",
units="Pa",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="PRES",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=62,
keep_mantissa_bits=10,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="categorical_ice_pellets_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="cicep",
long_name="Categorical ice pellets",
units="0=no; 1=yes",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="CICEP",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=91,
keep_mantissa_bits="no-rounding",
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="categorical_snow_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="csnow",
long_name="Categorical snow",
units="0=no; 1=yes",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="CSNOW",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=90,
keep_mantissa_bits="no-rounding",
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="categorical_freezing_rain_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="cfrzr",
long_name="Categorical freezing rain",
units="0=no; 1=yes",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="CFRZR",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=92,
keep_mantissa_bits="no-rounding",
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="categorical_rain_surface",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="crain",
long_name="Categorical rain",
units="0=no; 1=yes",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="CRAIN",
grib_description='0[-] SFC="Ground or water surface"',
grib_index_level="surface",
index_position=93,
keep_mantissa_bits="no-rounding",
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="relative_humidity_2m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="r2",
long_name="2 metre relative humidity",
units="%",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="RH",
grib_description='2[m] HTGL="Specified height level above ground"',
grib_index_level="2 m above ground",
index_position=75,
keep_mantissa_bits=default_keep_mantissa_bits,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="geopotential_height_cloud_ceiling",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="gh",
long_name="Geopotential height",
units="gpm",
step_type="instant",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="HGT",
grib_description='0[-] CEIL="Cloud ceiling"',
grib_index_level="cloud ceiling",
index_position=117,
keep_mantissa_bits=8,
hrrr_file_type="sfc",
),
),
# HRRR provides 80m but not 100m winds
NoaaHrrrDataVar(
name="wind_u_80m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="u80",
long_name="U-component of wind (80 m above ground)",
units="m/s",
step_type="instant",
standard_name="eastward_wind",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="UGRD",
grib_description='80[m] HTGL="Specified height level above ground"',
grib_index_level="80 m above ground",
index_position=60,
keep_mantissa_bits=6,
hrrr_file_type="sfc",
),
),
NoaaHrrrDataVar(
name="wind_v_80m",
encoding=encoding_float32_default,
attrs=DataVarAttrs(
short_name="v80",
long_name="V-component of wind (80 m above ground)",
units="m/s",
step_type="instant",
standard_name="northward_wind",
),
internal_attrs=NoaaHrrrInternalAttrs(
grib_element="VGRD",
grib_description='80[m] HTGL="Specified height level above ground"',
grib_index_level="80 m above ground",
index_position=61,
keep_mantissa_bits=6,
hrrr_file_type="sfc",
),
),
]

def _spatial_info(
self,
) -> tuple[
tuple[int, int], tuple[float, float, float, float], tuple[float, float], str
]:
"""
Returns (shape, bounds, resolution, crs proj4 string).
Useful for deriving x, y and latitude, longitude coordinates.
See tests/noaa/hrrr/forecast_48_hour/template_config_test.py::test_spatial_info_matches_file
"""
return (
(1059, 1799),
(
-2699020.142521929,
-1588806.152556665,
2697979.857478071,
1588193.847443335,
),
(3000.0, -3000.0),
"+proj=lcc +lat_0=38.5 +lon_0=-97.5 +lat_1=38.5 +lat_2=38.5 +x_0=0 +y_0=0 +R=6371229 +units=m +no_defs=True",
)
return self.get_data_vars(encoding_float32_default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if var_chunks, var_shards and var_encoding should be defined as properties or methods on the TemplateConfig. That would allow you to avoid needing to know to call this self.get_data_vars and pass things through, since the parent data_vars could reference self.var_encoding

Copy link
Member Author

@aldenks aldenks Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm i do like how that has a more clear place to define encodings and avoids this get_data_vars function. the downside is that makes it awkwards to have different encodings per variable (which is a pretty reasonable thing to do, eg. categorical vars, or larger chunking for really compressible variables.) Maybe there's some way around that..

)


class NoaaHrrrTemplateConfig(TemplateConfig[NoaaHrrrDataVar]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of nit since it's just a naming thing, but I wonder if it would be clearer to name this "base" class NoaaHrrrTemplateConfigMixin. I think that probably more accurately captures its use as a utility class that cannot be used standalone (required fields are not fully defined), inheritance still works, etc.

dim_coords = self.dimension_coordinates()
append_dim_coordinate_chunk_size = self.append_dim_coordinate_chunk_size()

spatial_coords = super().coords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were going the Mixin route, maybe I'd call self.spatial_coords rather than trying to reason about why super().coords only has spatial coordinates.

@mosegontar
Copy link
Contributor

I weighed having hrrr analysis classes just subclass their hrrr forecast counterparts.. that spreads logic over fewer places, this approach is probably more obvious to new readers. I could go either way

I think I generally prefer the approach you ended up with as being more intuitive (both inheriting from some shared common HRRR configuration), though maybe if there is some conceptually sensible relationship (where HRRR Analysis is seen as a variation on Forecast) directly inheriting from Forecast might make sense. I do agree with the potential upside (logic remains consolidated) and that one doesn't really have to reason about what the common/shared/base thing(s) represent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants