Skip to content

NLDAS 2

Luc Rébillout, PhD edited this page Nov 22, 2024 · 1 revision

NLDAS-2 is a NASA data product containing gridded assimilated data over the continental united states. It features a set of forcing data that can be used to produce AnnAGNPS compatible climate data:

  • U wind component (m/s) at 10 meters above the surface
  • V wind component (m/s) at 10 meters above the surface
  • air temperature (K) ** at 2 meters above the surface
  • specific humidity (kg/kg) ** at 2 meters above the surface
  • surface pressure (Pa) **
  • surface downward longwave radiation (W/m^2) **
  • surface downward shortwave radiation (W/m^2) -- bias-corrected (see Appendix B)
  • precipitation hourly total (kg/m^2)
  • fraction of total precipitation that is convective (no units): from NARR
  • CAPE: Convective Available Potential Energy (J/kg): from NARR
  • potential evaporation (kg/m^2): from NARR

Ultimately, NLDAS-2 also offers additional outputs from land surface models such as NOAH, VIC, or MOSAIC that could be used to complement the climate data.

Downloading data

With login information for a NASA EarthData account, i.e. username and password. One can use a command line interface (CLI) tool bundled with pyagnps to easily download the hourly netCDF files from GESDISC. Example bash script

# Define the date range to download
FROM_DATE="1999-12-31"
TO_DATE="2023-01-01"
PRODUCT="NLDAS_FORA0125_H.2.0"

# EarthData Login information
USERNAME="myusername" 
PASSWORD="mypassword"

# Output dir
OUTPUT_DIR="/datasets/CLIMATE/NLDAS2/FORA2.0"

# Activate the virtual environment
source venv/bin/activate 

# Run the download command with the date range
download-nldas2 \
    --product $PRODUCT \
    --username $USERNAME \
    --password $PASSWORD \
    --output_dir $OUTPUT_DIR \
    --from_date $FROM_DATE \
    --to_date $TO_DATE || { echo "Failed to run NLDAS-2 climate population script"; exit 1; }

Aggregating

Those netCDF files can then be aggregated as 24 hours daily files (according to the local timezone or the UTC timezone). Run aggregate-nldas2 --help for more options.

# Define the date range for this node
FROM_DATE="2009-12-17"
TO_DATE="2022-12-31"
PRODUCT="NLDAS_FORA0125_H.2.0"

# Fraction of the entire dataset to process in one go
# (i.e. that can be held in memory)
CHUNK=0.005

# Input and output directories
INPUT_DIR="/datasets/CLIMATE/NLDAS2/FORA2.0"
OUTPUT_DIR="/datasets/CLIMATE/NLDAS2/FORA2.0_DAILY"

# Aggregate the downloaded files

aggregate-nldas2 \
    --product $PRODUCT \
    --files_dir $INPUT_DIR \
    --chunk_frac_tot $CHUNK \
    --output_dir_agg_netcdf $OUTPUT_DIR \
    --from_date $FROM_DATE \
    --to_date $TO_DATE || { echo "Failed to run NLDAS-2 daily aggregation population script"; exit 1; }

Saving as local data rods

The netCDF can be transformed to produce data rods (more appropriate for working with time series) with one or multiple rods per climate station.

# Path containing NLDAS-2 pre-aggregated files
PATH_NLDAS2_DAILY_FILES="/aims-nas/data/datasets/CLIMATE/NLDAS2/FORA2.0_DAILY"

# Path to write data rods
PATH_NLDAS2_DAILY_RODS_FILES="/aims-nas/data/datasets/CLIMATE/NLDAS2/FORA2.0_DAILY_DATA_RODS"

# Latitudes and Longitudes to process
LATS="all"
LONS="all"

# Dates to process
start="2009-12-17"
end="2022-12-31"

# Maximum Number of Global Iterations to try
MAXITER_GLOBAL=2

# Create directory for data rods
mkdir --parents "$PATH_NLDAS2_DAILY_RODS_FILES"

# Run the Python script with the arguments
generate-nldas2-rod \
        --start_date "$start" \
        --end_date "$end" \
        --path_nldas_daily_files "$PATH_NLDAS2_DAILY_FILES" \
        --output_dir "$PATH_NLDAS2_DAILY_RODS_FILES/data_rods_$start-$end" \
        --partition_size 500MB \
        --lats "$LATS" \
        --lons "$LONS" \
        --saveformat parquet \
        --maxiter_global "$MAXITER_GLOBAL"
        # --delete_existing_chunks 
        # --assemble_chunks_in_single_rods
        # --delete_existing_data_rods

Populating to a custom database

Those data rods can be used as is for general purpose NLDAS-2 data usage. Within the AIMS project they were uploaded to PostgreSQL database with the PostGIS extension and the TimescaleDB extension after being formatted in the AnnAGNPS format. This documentation will assume that this database is properly setup. It also assumes that you have a separate table nldas2_stations with the following schema representing the ID and the location of the climate stations.

attribute type
station_id text
geom geometry (point)

You will need to define a custom function on your database that will find the nearest station for every pair of coordinates you pass it to:

Defining `find_nearest_nldas2_station`
CREATE OR REPLACE FUNCTION find_nearest_nldas2_station(lon FLOAT, lat FLOAT)
RETURNS TABLE (station_id TEXT, x FLOAT, y FLOAT)
AS $$
BEGIN
  RETURN QUERY
  SELECT nldas2_stations.station_id, ST_X(geom) AS x, ST_Y(geom) AS y
  FROM nldas2_stations
  ORDER BY ST_Distance(geom, ST_GeomFromText('POINT(' || lon || ' ' || lat || ')', 4326)) ASC
  LIMIT 1;
END;
$$ LANGUAGE plpgsql;
Initializing the `climate_nldas2` table
CREATE TABLE climate_nldas2 (
    station_id TEXT,
    date DATE,
    month INT2,
    day INT2,
    year INT2,
    max_air_temperature FLOAT4,
    min_air_temperature FLOAT4,
    precip FLOAT4,
    dew_point FLOAT4,
    sky_cover FLOAT4,
    wind_speed FLOAT4,
    wind_direction FLOAT4,
    solar_radiation FLOAT4,
    storm_type_id TEXT,
    potential_et FLOAT4,
    actual_et FLOAT4,
    actual_ei FLOAT4,
    input_units_code INT2,
    geom GEOMETRY(Point, 4326),
    PRIMARY KEY (station_id, date)
);

-- Convert the table into a hypertable
SELECT create_hypertable('climate_nldas2', 'date');

-- Enable compression
ALTER TABLE climate_nldas2 SET (timescaledb.compress, timescaledb.compress_orderby = 'date', timescaledb.compress_segmentby = 'station_id');

-- Set up chunking
SELECT set_chunk_time_interval('climate_nldas2', INTERVAL '1 month');

Populating the data rods to the database:

# Path to write data rods
PATH_NLDAS2_DAILY_RODS_FILES="/aims-nas/data/datasets/CLIMATE/NLDAS2/FORA2.0_DAILY_DATA_RODS"

# Path to json files containing credentials for connecting to the PostgreSQL database
PATH_DB_CREDS="/batch_processes/db_credentials.json"

export PYTHONUNBUFFERED=TRUE
# Run the Python script with the arguments
populate-nldas2-daily-parquet-db \
        --rods_dir "$PATH_NLDAS2_DAILY_RODS_FILES" \
        --path_to_creds "$PATH_DB_CREDS" \
        --db_table_name climate_nldas2 \
        # --pattern "*/temp/climate_daily_*1_chunk*.parquet" \ 
        # --delete_chunks_on_sucess 

# The --pattern flag is optional and will only upload parquet files that match a specific file pattern, here those that contain 1_chunk in their filename. If ignored it will process all data rods.

Querying the database in AnnAGNPS format

The following query will generate AnnAGNPS ready data

SELECT
    month, day, year,
    max_air_temperature, min_air_temperature,
    precip,
    dew_point,
    sky_cover,
    wind_speed,
    wind_direction,
    solar_radiation,
    storm_type_id,
    potential_et,
    actual_et,
    input_units_code
FROM climate_nldas2
WHERE 
    station_id = (SELECT station_id FROM find_nearest_nldas2_station(-89.5193315, 34.3662867))
AND date >= '2000-01-01' AND date <= '2022-12-31';

Or using a properly initialized pyagnps.climate.ClimateAnnAGNPSCoords object clm:

# Assuming an engine database connection object exists with the proper credentials:
df = clm.generate_annagnps_daily_climate_data_from_db(engine) # One can also directly specify the station_id if known as a kwarg

Clone this wiki locally