-
Notifications
You must be signed in to change notification settings - Fork 1
Description
General Overview
The overarching goal is to use the USGS "Machine to Machine" API to search the EarthExplorer archives for Wisconsin aerial photography. Using a python script, connect to their API and send the search parameters defined below. The search results (returned as json) will be dumped into a single shapefile. I expect there will be a total 125,000+ points returned that will be in range of 1930 to the mid 2000's.
Output
A single shapefile named EarthExplorer_WI_Imagery_[date].shp. Where date is the format YYYY_MM_DD, and reflects the date the script was run. The shapefile should have a coordinate system of spatial reference 4326.
Output Attributes and Mapping to EarthExplorer Attributes
Borrowing inspiration from https://github.com/geoblacklight/geoblacklight/wiki/GeoBlacklight-1.0-Metadata-Elements
Reminder: 10 char limit in shapefile field names
Each "datasetName" will return different fields! The attributes listed below are the most common and useful.
The following is very fluid... you've been warned!
| EarthExplorer Attribute | Shapefile Attribute | Shapefile Format | Example |
|---|---|---|---|
| N/A | FID | Object ID | [inserted by Arc] |
| N/A | Shape | Geometry | [inserted by Arc] |
| entityId | identifier | text | |
| startDate | date | text | 19550619 |
| startDate | year | short | 1955 |
| creatorlon | text | United States Geological Survey | |
| creator | text | USGS | |
| browsePath | browse_url | text | |
| fullres_url | text | ||
| more_url | text | https://search.library.wisc.edu/digital/AKCSHCFNL642UW8M | |
| photosize | text | ||
| roll_exp | text | 1-46 | |
| mapscale | text | 1:20,000 | |
| film_type | film_type | text | |
| center_lon | center_lon | double | -91.285938 |
| center_lat | center_lat | double | 44.956207 |
| N/A | provenance | text | United States Geological Survey |
| N/A | filename | text | AerialWisconsin_1930s\Washburn\aerial24719.tif |
| film_sensor | |||
| resolution | text | 1 | |
| res_units | text | meter | |
Domains
provenance: United States Geological Survey, United States Department of Agriculture, Robinson Map Library, University of Wisconsin Digital Collections Center
filmtype = B&W, CIR,
Notes
- Convert all EarthExplorer attribute codes into text strings. This will require some lookup tables to make the conversion. Why: this will make the shapefile more useable without the need for related tables.
- Year (YYYY) is derived from the startDate field.
USGS Data Dictionaries
https://lta.cr.usgs.gov/DD/napp.html
https://lta.cr.usgs.gov/DD/nhap.html
https://lta.cr.usgs.gov/DD/high_res_ortho.html
https://lta.cr.usgs.gov/DD/aerial_single_frame.html
https://lta.cr.usgs.gov/DD/doq_qquad.html
Considerations
- M2M docs are here: https://m2m.cr.usgs.gov/api/docs/json/
- This script will be run periodically to update the shapefile. How often is TBD, but likely several times per year. ArcPy environment should be configured to overwrite existing files. (arcpy.env.overwriteOutput = True)
- There are five different datasets ("datasetName") to be searched: aerial_combin, nhap, napp, doq_qq, high_res_ortho. How to loop thru the different datasets? Can they be stacked in one search?
- The API will return at most 50,000 records per query. Therefore, the queries will need to happen in batches. Logically, I think it makes the most sense to query by decade, But as noted above, all of the results can be output to a single shapefile.
- The geodata-processing repo is public, so be mindful of usernames/passwords stored in script(s). Store user/pass in a file that is referenced in .gitignore.
- While it's theoretical possible, I have not been successful in sending a geojson outline of Wisconsin to the API to limit the results to Wisconsin only. I've only been able to get a large bounding box to work, which will include many results from Minnesota, Iowa, Michigan, and Illinois as well. Therefore, one of the last steps in the process is to clip the results to Wisconsin, with a slight buffer.
General Workflow
See starter script provided by Jim for more info on ArcPy syntax
- Connect to API with username/password, retrieve API Key. The API key must be supplied with future API requests.
- Create the shapefile using ArcPY with the naming template noted above. Uncertain if it's easier/better to create from a template shapefile, or insert new attributes into empty shapefile. The latter seems much more flexible if you can figure it out.
- Begin looping through each decade
- Submit a search to the "/scene-search" endpoint to retrieve a list of available images for that decade. The critical attribute is entityId, which is a unique ID for each image. Be sure to set the metadataType to full in order to get an extended set of attributes that we need. (Amended 8/25)
- Loop through each entityID
- submit request to /download-options endpoint (Amended 8/25)
- Submit new request to /download-request endpoint with the entityID to get a download URL. (amended 8/25)
- Insert new point into shapefile using attributes from dictionary object via ArcPy. Hint; use cursors. The x,y point geometry should come from the center lat/lon returned by the query.
- Repeat search with next entityID.
- Submit a search to the "/scene-search" endpoint to retrieve a list of available images for that decade. The critical attribute is entityId, which is a unique ID for each image. Be sure to set the metadataType to full in order to get an extended set of attributes that we need. (Amended 8/25)
- Repeat search with next decade
- Close cursor in ArcPy
- Using ArcPy, clip results to a buffered Wisconsin boundary. Let's start with one mile buffer. No need to get fancy with buffering on the fly, just locate a state boundary layer, buffer it in ArcGIS Pro, and save that for the clipping.
Potential Gotchas and Showstoppers
At this point it's unclear to me if the URLs generated by /dowload-request are persistent. Meaning, when we generate a URL will that same URL work three weeks from now, or is it just ephemeral? If the latter, that will be a major problem for this workflow!
Future
- A Python script that can be used to push the shapefile directly into Carto in order to quickly update the existing data. David has done this with SurveyControlFinder points, so it's definitely possible.
- A Python script that outputs geojson files for GeoData@Wisconsin. Uncertain (for now) if we should output geojson by county, decade, or some other scheme. Our priority for now is the single shapefile describe above, which will be used by WHAIFinder. More discussion with Jaime M. needed.