Skip to content

Develop python script to query EarthExplorer API and dump results to Shapefile #20

@Wiscmapper

Description

@Wiscmapper

General Overview

The overarching goal is to use the USGS "Machine to Machine" API to search the EarthExplorer archives for Wisconsin aerial photography. Using a python script, connect to their API and send the search parameters defined below. The search results (returned as json) will be dumped into a single shapefile. I expect there will be a total 125,000+ points returned that will be in range of 1930 to the mid 2000's.

Output

A single shapefile named EarthExplorer_WI_Imagery_[date].shp. Where date is the format YYYY_MM_DD, and reflects the date the script was run. The shapefile should have a coordinate system of spatial reference 4326.

Output Attributes and Mapping to EarthExplorer Attributes

Borrowing inspiration from https://github.com/geoblacklight/geoblacklight/wiki/GeoBlacklight-1.0-Metadata-Elements
Reminder: 10 char limit in shapefile field names
Each "datasetName" will return different fields! The attributes listed below are the most common and useful.
The following is very fluid... you've been warned!

EarthExplorer Attribute Shapefile Attribute Shapefile Format Example
N/A FID Object ID [inserted by Arc]
N/A Shape Geometry [inserted by Arc]
entityId identifier text
startDate date text 19550619
startDate year short 1955
creatorlon text United States Geological Survey
creator text USGS
browsePath browse_url text
fullres_url text
more_url text https://search.library.wisc.edu/digital/AKCSHCFNL642UW8M
photosize text
roll_exp text 1-46
mapscale text 1:20,000
film_type film_type text
center_lon center_lon double -91.285938
center_lat center_lat double 44.956207
N/A provenance text United States Geological Survey
N/A filename text AerialWisconsin_1930s\Washburn\aerial24719.tif
film_sensor
resolution text 1
res_units text meter

Domains

provenance: United States Geological Survey, United States Department of Agriculture, Robinson Map Library, University of Wisconsin Digital Collections Center
filmtype = B&W, CIR,

Notes

  1. Convert all EarthExplorer attribute codes into text strings. This will require some lookup tables to make the conversion. Why: this will make the shapefile more useable without the need for related tables.
  2. Year (YYYY) is derived from the startDate field.

USGS Data Dictionaries

https://lta.cr.usgs.gov/DD/napp.html
https://lta.cr.usgs.gov/DD/nhap.html
https://lta.cr.usgs.gov/DD/high_res_ortho.html
https://lta.cr.usgs.gov/DD/aerial_single_frame.html
https://lta.cr.usgs.gov/DD/doq_qquad.html

Considerations

  • M2M docs are here: https://m2m.cr.usgs.gov/api/docs/json/
  • This script will be run periodically to update the shapefile. How often is TBD, but likely several times per year. ArcPy environment should be configured to overwrite existing files. (arcpy.env.overwriteOutput = True)
  • There are five different datasets ("datasetName") to be searched: aerial_combin, nhap, napp, doq_qq, high_res_ortho. How to loop thru the different datasets? Can they be stacked in one search?
  • The API will return at most 50,000 records per query. Therefore, the queries will need to happen in batches. Logically, I think it makes the most sense to query by decade, But as noted above, all of the results can be output to a single shapefile.
  • The geodata-processing repo is public, so be mindful of usernames/passwords stored in script(s). Store user/pass in a file that is referenced in .gitignore.
  • While it's theoretical possible, I have not been successful in sending a geojson outline of Wisconsin to the API to limit the results to Wisconsin only. I've only been able to get a large bounding box to work, which will include many results from Minnesota, Iowa, Michigan, and Illinois as well. Therefore, one of the last steps in the process is to clip the results to Wisconsin, with a slight buffer.

General Workflow

See starter script provided by Jim for more info on ArcPy syntax

  • Connect to API with username/password, retrieve API Key. The API key must be supplied with future API requests.
  • Create the shapefile using ArcPY with the naming template noted above. Uncertain if it's easier/better to create from a template shapefile, or insert new attributes into empty shapefile. The latter seems much more flexible if you can figure it out.
  • Begin looping through each decade
    • Submit a search to the "/scene-search" endpoint to retrieve a list of available images for that decade. The critical attribute is entityId, which is a unique ID for each image. Be sure to set the metadataType to full in order to get an extended set of attributes that we need. (Amended 8/25)
      • Loop through each entityID
      • submit request to /download-options endpoint (Amended 8/25)
      • Submit new request to /download-request endpoint with the entityID to get a download URL. (amended 8/25)
      • Insert new point into shapefile using attributes from dictionary object via ArcPy. Hint; use cursors. The x,y point geometry should come from the center lat/lon returned by the query.
      • Repeat search with next entityID.
  • Repeat search with next decade
  • Close cursor in ArcPy
  • Using ArcPy, clip results to a buffered Wisconsin boundary. Let's start with one mile buffer. No need to get fancy with buffering on the fly, just locate a state boundary layer, buffer it in ArcGIS Pro, and save that for the clipping.

Potential Gotchas and Showstoppers

At this point it's unclear to me if the URLs generated by /dowload-request are persistent. Meaning, when we generate a URL will that same URL work three weeks from now, or is it just ephemeral? If the latter, that will be a major problem for this workflow!

Future

  • A Python script that can be used to push the shapefile directly into Carto in order to quickly update the existing data. David has done this with SurveyControlFinder points, so it's definitely possible.
  • A Python script that outputs geojson files for GeoData@Wisconsin. Uncertain (for now) if we should output geojson by county, decade, or some other scheme. Our priority for now is the single shapefile describe above, which will be used by WHAIFinder. More discussion with Jaime M. needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions