The processed data files live in the src/phl_budget_data/data/processed folder.
There are three folders:
collections/: This folder includes:city-collections.csv: The city's monthly collections, parsed from public Revenue Dept. reports; includes tax, non-tax, other govt. collections.city-tax-collections.csv: The city's monthly tax collections, parsed from public Revenue Dept. reports; includes only tax collections.school-collections.csv: The school district's monthly collections, parsed from public Revenue Dept. reportsrtt-collections-by-sector.csv: A breakdown of Realty Transfer Tax collections by sector, parsed from public Revenue Dept. reportssales-collections-by-sector.csv: A breakdown of Sales Tax collections by sector, parsed from public Revenue Dept. reportswage-collection-by-sector.csv: A breakdown of Wage Tax collections by sector, parsed from public Revenue Dept. reports
qcmr/: This folder includes data parsed from the Quarterly City Manager's Report (QCMR):cash-reports-*.csv: Data parsed from different parts of the Cash Report in the back of the QCMRdepartment-obligations.csv: Data parsed from the Departmental Obligations table in the QCMRfulltime-positions.csv: Data parsed from the Fulltime Positions Report table in the QCMRpersonal-services-summary.csv: Data parsed from the Personal Services Summary table in the QCMR
spending/: This folder includes data parsed from City Budget-in-Brief documents:actual-department-spending.csv: Historical actual spending by departmentbudgeted-department-spending-adopted.csv: Budgeted spending by department from the adopted budgetbudgeted-department-spending-proposed.csv: Budgeted spending by department from the proposed budget
First clone the environment:
git clone https://github.com/PhiladelphiaController/phl-budget-data.gitThen, install the Python dependencies with poetry:
cd phl-budget-data
poetry installAnd run the help message for the main command:
poetry run phl-budget-data --helpYou will need AWS credentials for running the parsing scripts. Create a .env file in the root of the project
that is mirrored off of .env.example and fill in the values. To get the AWS
credentials, go to the "Credentials/" folder on the FPD Sharepoint.
In general, the process for adding new data is:
- Add the raw PDF files to the appropriate folder in
src/phl_budget_data/data/etl/raw. Look at past PDF files to make sure you are adding the correct table to the correct folder. You should make sure to add a PDF that only contains the pages with the table information. - Run the appropriate ETL command for the data you are parsing; run
poetry run phl-budget-data etl --helpto see the available commands. For example, to parse the cash report data, runpoetry run phl-budget-data etl CashReport. This will create a new CSV file in the appropriate folder insrc/phl_budget_data/etl/data/processed. - Update the files in the processed data folder
src/phl_budget_data/data/processedby saving new versions:poetry run phl-budget-data save.
- Extract out the two-page cash report PDF from the latest QCMR and save it to:
src/phl_budget_data/data/etl/raw/qcmr/cash/. - Run the ETL parsing command. For example, for FY23 Q4 you would run:
poetry run phl-budget-data etl CashReport --fiscal-year 2023 --quarter 4. - Update the main processed data files:
poetry run phl-budget-data save.
There is a GitHub action in this repository that runs daily and checks the City's website for newly uploaded monthly collection reports. These reports are uploaded to the City's revenue reports with about a month delay. The script checks for new data and will parse and save it to the repository if it finds a new report.
