This repository contains code and data necessary to replicate the findings of Carleton, Crews, and Nath (AEA Papers and Proceedings, 2024).
Scripts in this repository are written in Python, Stata, and R. Throughout this document, it is assumed that the replicator operates from a working directory containing all necessary files and folders, which are detailed in the structure below.
├── code
│ ├── run.sh
│ ├── setup.R
│ ├── 1_cleaning
│ │ ├── 01_country_crosswalk.py
│ │ ├── 02_crop_crosswalk.py
│ │ ├── 03_build_tradedata.py
│ │ ├── 04_build_proddata.py
│ │ ├── 05_fill_tradedata.do
│ │ ├── 06_virtualwater.py
│ │ ├── 07_crop_water_use.do
│ │ ├── 08_merge_grace_gaez.R
│ │ ├── 09_merge_grace_agemp.R
│ │ ├── 10_merge_grace_virtwflows.R
│ └── 2_analysis
│ ├── 01_make_maps.R
│ ├── maps
│ │ ├── fig1.R
│ │ ├── fig2a.R
│ │ ├── fig2c.R
│ │ ├── fig2e.R
│ │ ├── fig3a.R
│ │ ├── figA1a.R
│ │ ├── figA1c.R
│ │ ├── figA4a.R
│ │ ├── figA4c.R
│ ├── 02_make_scatterplots.do
│ ├── scatterplots
│ │ ├── fig2b.do
│ │ ├── fig2d.do
│ │ ├── fig2f.do
│ │ ├── fig3b.do
│ │ ├── figA1b.do
│ │ ├── figA1d.do
│ │ ├── figA2.do
│ │ ├── figA3.do
│ │ ├── figA4b.do
│ │ ├── figA5.do
│ │ ├── figA6.do
│ │ ├── figA7.do
│ │ ├── figA8.do
│ └── 03_intext_stats.do
├── data
│ ├── input
│ │ ├── ag_emp_share
│ │ │ └── P_Data_Extract_From_World_Development_Indicators.xlsx
│ │ ├── baci_comtrade
│ │ │ ├── BACI_HS07_Y2009_V202102.csv
│ │ │ ├── country_codes_V202102.csv
│ │ │ └── product_codes_HS07_V202102.csv
│ │ ├── cds_2016
│ │ │ ├── CountryNamesNumbers.csv
│ │ │ └── crop_names_numbers_codes.csv
│ │ ├── crop_water_footprint
│ │ │ ├── Report47Appendix-II.xlsx
│ │ │ └── water_intensity_per_tonne.csv
│ │ ├── faostat
│ │ │ ├── fmrsudan_qcl.csv
│ │ │ ├── fmrsudan_qv.csv
│ │ │ ├── Prices_E_All_Data_(Normalized).csv
│ │ │ ├── Production_Crops_Livestock_E_All_Data_(Normalized).csv
│ │ │ └── Value_of_Production_E_All_Data_(Normalized).csv
│ │ ├── gaez
│ │ │ └── gaez.csv
│ │ ├── grace
│ │ │ └── grace.dta
│ │ ├── hand
│ │ │ ├── caf_gaezpy_crosswalk.csv
│ │ │ ├── fao_country_crosswalk.csv
│ │ │ └── FCL_HS_mappings_2020-01-07.csv
│ │ ├── nra
│ │ │ ├── AgIncentivesNRP.csv
│ │ │ └── UpdatedDistortions_to_AgriculturalIncentives_database_0613.xls
│ └── intermediate
│ ├── crosswalks
│ ├── grace_merged
│ ├── production
│ └── trade
└── results
The code was last run on a Linux cluster with Stata (version 17.0), R (version 4.2.3), and Python (version 3.9.18). Note that the scripts in this repository do not install any of the packages needed to replicate this project. Thus, the user must install the packages listed below before attempting to run any script. The user should also be aware that installing these packages may take some time (could take as long as 30 mins if none of the R packages were previously installed).
- Stata
- egenmore
- R
- tidyverse=2.0.0
- haven=2.5.2
- readxl=1.4.2
- broom=1.0.4
- sf=1.0-12
- rnaturalearthdata=1.0.0
- rnaturalearth=1.0.1
- cowplot=1.1.1
- scales=1.2.1
- biscale=1.0.0
- dichromat=2.0-0.1
- geosphere=1.5-18
- Python
- numpy=1.13.5
- pandas=1.5.3
- fuzzywuzzy=0.18.0
- regex=2023.10.3
| File | Source |
|---|---|
data/input/ag_emp_share/P_Data_Extract_From_World_Development_Indicators.xlsx |
Agricultural employment share in 2009. Obtained from the World Bank's World Development Indicators Database. |
data/input/baci_comtrade/BACI_HS07_Y2009_V202102.csv data/input/baci_comtrade/country_codes_V202102.csv data/input/baci_comtrade/product_codes_HS07_V202102.csv |
Bilateral trade flows (version HS07) from UN Comtrade obtained through the BACI portal hosted by CEPII. |
data/input/cds_2016/CountryNamesNumbers.csv data/input/cds_2016/crop_names_numbers_codes.csv |
Lists of unique country/crop names, numbers, and codes. The names and numbers match those used by Costinot, Donaldson, & Smith (2016), which got extracted from their replication package. |
data/input/crop_water_footprint/Report47Appendix-II.xlsx |
Crop water intensity data from Mekonnen and Hoekstra (2011) hosted by the Water Footprint Network under "Value of Water Report: 47: The green, blue and grey water footprint of farm crops and derived crop products." |
data/input/crop_water_footprint/water_intensity_per_tonne.csv |
Table 4 of Mekonnen and Hoekstra (2011). |
data/input/faostat/fmrsudan_qcl.csv data/input/faostat/fmrsudan_qv.csv |
FAOSTAT Data Download interface to download the QCL and QV data series for the former Sudan before the secession of South Sudan in 2011. |
data/input/faostat/Prices_E_All_Data_(Normalized).csv data/input/faostat/Production_Crops_Livestock_E_All_Data_(Normalized).csv data/input/faostat/Value_of_Production_E_All_Data_(Normalized).csv |
FAOSTAT Data Download interface to download the QCL and QV data series for all countries. |
data/input/gaez/gaez.csv |
Crosswalk between GAEZ (v4) Agro-climatic Potential Yield fields and GRACE cells from Carleton, Crews and Nath (2024). |
data/input/grace/grace.dta |
GRACE data from Carleton, Crews and Nath (2024). GRACE cells are merged with groundwater levels estimates from Fan, Li and Miguez-Macho (2013), gridded cropped area fraction data from Monfreda et al. (2008) and from Ramankutty et al. (2008), and cumulative precipitation from the Global Metorological Forcing Dataset . The croppped area fraction data was accessed through the SAGE portal. |
data/input/hand/caf_gaezpy_crosswalk.csv |
Crosswalk of crop names between GAEZ v4 and cropped area fraction data. |
data/input/hand/fao_country_crosswalk.csv |
Obtained from deprecated FAO link. |
data/input/hand/FCL_HS_mappings_2020-01-07.csv |
Downloaded from deprecated FAO link. |
data/input/nra/AgIncentivesNRP.csv data/input/nra/UpdatedDistortions_to_AgriculturalIncentives_database_0613.xls |
Obtained through the AgIncentives Database. |
The analysis for this project can be fully replicated (from start to end) using the bash script code/run.sh. To do so, the replicator must install the software requirements detailed above and place the path of their working directory on line 11 of the bash script before executing it.
However, if the replicator wishes to only run the code partially or script by script, the programs should be executed in the following order (after setting the correct working directory):
code/1_cleaning/01_country_crosswalk.pycreates a country name crosswalk used in later scripts.code/1_cleaning/02_crop_crosswalk.pycreates a crop crosswalk used in later scripts.code/1_cleaning/03_build_tradedata.pycleans trade data from BACI/COMTRADE.code/1_cleaning/04_build_proddata.pycleans production data from the FAO.code/1_cleaning/05_fill_tradedata.dofills trade data with auto-consumption.code/1_cleaning/06_virtualwater.pycreates the virtual water flows estimates.code/1_cleaning/07_crop_water_use.doproduces global water use estimates per crop.code/1_cleaning/08_merge_grace_gaez.Rgenerates analysis-ready data at the GRACE grid cell level by merging them with productivity data from GAEZ v4.code/1_cleaning/09_merge_grace_agemp.Rgenerates analysis-ready data at the GRACE grid cell level by merging them with World Bank agricultural employment share data.code/1_cleaning/10_merge_grace_virtwflows.Rgenerates analysis-ready data at the GRACE grid cell level by merging them with virtual water flows estimates.code/2_analysis/01_make_maps.Rproduces all maps (Figures 1, 2a, 2c, 2e, 3a, and Appendix Figures A1a, A1c, A4a and A4c) in the paper.code/2_analysis/02_make_scatterplots.doproduces all scatterplots (Figures 2b, 2d, 2f, 3b, and Appendix Figures A1b, A1d, A2, A3, A4b, A4d, A5, A6, A7, A8) in the paper.code/2_analysis/03_intext_stats.doproduces all in-text statistics in the paper.