Skip to content

hobuinc/stacem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StacEm

A STAC Directory created using WESM JSON. This document will walk you through the usage of this python project, stacem.

STAC Structure

In this project, we're creating a very straight forward STAC structure that's very similar to a normal directory structure you may use on your personal computer.

At the coarsest layer, we have the Catalog, which is written at catalog.json in your base directory. This Catalog organizes all of our projects underneath it, and houses the links to each one.

Each of these projects and the coarse metadata associated with it will be stored in a Collection within the project directory in a file called collection.json. This Collection will tell us about the project itself with key information like the date ranges for its creation and the bounding box for the data within it.

Each Collection will have a list of links to the individual STAC Items, which are the lowest level we'll get in this structure. Here, an Item references a specific laz file and accompanying xml document. Using the metadata from the WESM file, as well as information gathered from a PDAL info that's run over the pointcloud, we create a much more granular metadata file, with much more accurate bounding box information and information about the pointcloud itself.

The directory structure will look like this:

./
    - catalog.json
    - AK_ANCHORAGE_2015
        - collection.json
        - USGS_LPC_Anchorage_Lidar_63236780
            - USGS_LPC_Anchorage_Lidar_63236780.json
        - USGS_LPC_Anchorage_Lidar_63716807
            - USGS_LPC_Anchorage_Lidar_63716807.json
        ...
        - USGS_LPC_Anchorage_Lidar_63716806
            - USGS_LPC_Anchorage_Lidar_63716806.json

Usage

In order to create this structure, we'll use 2 different python modules: stacem_create and stacem_finalize.

stacem_create

The Create module will build our structure from the ground up, starting with leaf nodes. Create will iterate through each project and process all of the pointclouds that are found in the lpc link key from the WESM JSON object. It will pluck out the necessary information from the PDAL info call, including stats about the point schema, the coordinate systems, and bounding box. If something goes wrong with this part, we will write out any errors to the errors key in the STAC Item and continue to publishing it. This will allow us to track any problems with pointcloud files much easier.

Once each Item has been created for a specific project, a Collection will be made using those Items as children, and a collection.json file will be written to the local project directory.

CLI Usage:

stacem_create [-h] [--wesm_url WESM_URL] [--local_dst LOCAL_DST] [--s3_bucket S3_BUCKET] [--s3_path S3_PATH] [--update UPDATE]

options:
  -h, --help            show this help message and exit
  --wesm_url WESM_URL   Url for WESM JSON
  --local_dst LOCAL_DST
                        Destination directory
  --s3_bucket S3_BUCKET
                        S3 destination bucket
  --s3_path S3_PATH     S3 path within destination bucket
  --update UPDATE       Update flag. If set, will update JSON objects that already exist with new values.

Example usage:

WESM_URL="https://apps.nationalmap.gov/lidar-explorer/lidar_ndx.json"

stacem_create --wesm_url $WESM_URL --local_dst 'wesm_stac/' --s3_bucket usgs-lidar-public --s3_path 'wesm_stac/'
stacem_finalize

The Finalize command will troll through the directory structure that we created in the Create step, and will collect all of the Collections into a list, to be added as children to our overarching Catalog. This Catalog will then be written out to the base directory as a catalog.json file.

CLI Usage:

usage: stacem_finalize [-h] [--local_path LOCAL_PATH] [--s3_bucket S3_BUCKET] [--s3_path S3_PATH] [--out_dst OUT_DST]

options:
  -h, --help            show this help message and exit
  --local_path LOCAL_PATH
                        Finalize will use this argument for location by default if no s3 bucket is supplied.
  --s3_bucket S3_BUCKET
                        S3 destination bucket. Finalize will use this argument if it's supplied.
  --s3_path S3_PATH     S3 path within destination bucket
  --out_dst OUT_DST     Catalog destination path.

Example Usage:

# If running this on a local implementation
stacem_finalize --local_dst './wesm_stac/'

# If running against an s3 implementation
stacem_finalize --s3_bucket 'usgs-lidar-public' --s3_path '/wesm_stac/'
Updating

This project was created with the goal of continuously updating one structure. On the schedule selected by the user, this module can be rerun over a previously made structure, updating any key-value pairs that have been added, as well as looking at ETag values of pointcloud files to determine whether or not a new run of pdal info is needed.

If there is nothing new being added, and no reprocessessing occurs, then the Create module will effectily be copying any data from s3 to your local machine.

Pushing to S3

Once you've created your local structure, all you need to do is perform something akin to a s3 sync call. s3 sync is a good function to have on deck, because it will not perform any unnecessary upload operations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages