At the center of Recidiviz is our platform for tracking granular criminal justice metrics in real time. It includes a system for the ingest of corrections records from different source data systems, and for calculation of various metrics from the ingested records.
This project is licensed under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The Recidiviz data system is provided as open source software - for transparency and collaborative development, to help jump-start similar projects in other spaces, and to ensure continuity if Recidiviz itself ever becomes inactive.
If you plan to fork the project for work in the criminal justice space (to ingest from the same systems we are, or similar), we ask that you first contact us for a quick consultation. We work carefully to ensure that our ingest activities don't disrupt other users' experiences with the public data services we read, but if multiple ingest processes are running against the same systems, without knowing about one another, it may place excessive strain on them and impact the services those systems provide.
If you have ideas or new work for the same data we're collecting, let us know and we'll work with you to find the best way to get it done.
If you are contributing to this repository regularly for an extended period of time, request GitHub collaborator access to commit directly to the main repository.
If you can install python3.9 locally, do so. For local Python development, you
will also need to install the libpq PostgreSQL client library and openssl.
On a Mac with Homebrew, you can install python3.9 by first
installing pyenv with:
brew install pyenv
brew install xz
mkdir ~/.pyenvThen, add the following to your ~/.zshrc (or equivalent):
export PATH="$HOME/.local/bin:$PATH"
if command -v pyenv 1>/dev/null 2>&1; then
eval "$(pyenv init -)"
fi
Then run:
pyenv install 3.9.12
pyenv global 3.9.12
Verify that you have the correct version of python across contexts by opening a new terminal window and running:
python -V
Once python is installed, you can install libpq and openssl with:
$ brew install postgresql@13 openssland add the following to your ~/.zshrc (or equivalent):
export PATH="/opt/homebrew/opt/postgresql@13/bin:$PATH"
On Ubuntu 18.04,openssl is installed by default, you can install python3.9
and libpq with:
$ apt update -y && apt install -y python3.9-dev python3-pip libpq-devYou do not need to change your default python version, as pipenv will look for
3.9.
Upgrade your pip to the latest version:
$ pip install -U pipNOTE: if you get ImportError: cannot import name 'main' after upgrading
pip, follow the suggestions in
this issue.
If you do not already have pip installed, you can install it on a Mac with
these commands:
$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
$ python get-pip.py --userOn Ubuntu 18.04, you can install pip with:
$ sudo apt-get install python-pip
Install pipenv:
$ pip install pipenv --userFork this repository, clone it locally, and enter its directory:
$ git clone [email protected]:your_github_username/pulse-data.git
$ cd pulse-data
To create a new pipenv environment and install all project and development
dependencies on mac and debian machines, run the initial_pipenv_setup script.
NOTE: Installation of one of our dependencies (psycopg2) requires OpenSSL,
and as OpenSSL is not linked on Macs by default, this script temporarily sets
the necessary compiler flags and then runs pipenv sync --dev. After this
initial installation all pipenv sync/installs should work without this script.
$ ./initial_pipenv_setup.shOn a Linux machine, run the following:
$ pipenv sync --devNOTE: if you get pipenv: command not found, add the binary directory to
your PATH as described
here.
To activate your pipenv environment, run:
$ pipenv shellOn a Mac with Homebrew, you can install the JRE with:
$ brew install javaOn Ubuntu 18.04, you can install the JRE with:
$ apt update -y && apt install -y default-jreOn a Mac with Homebrew, you can install jq (needed to deploy calculation pipelines) with:
$ brew install jqOn Ubuntu 18.04, you can install jq with:
$ apt update -y && apt install -y jqFinally, run pytest. As of Feb 2022, one might expect ~200 tests to fail
locally, with errors mainly falling into one of two categories:
Receiver() takes no arguments and
Already initialized database/ValueError: Accessing SQLite in-memory database on multiple threads.
The former error is due to an incompatibility with Cython that may be due to
newer Mac models or python versions, and the latter is due to tests not properly
cleaning up after themselves. All of these tests pass in CI. You can ignore any
failing tests with (for example):
$ pytest --ignore=recidiviz/tests/path/to/testsIf you can't install python3.9 locally, you can use Docker instead.
See below for installation instructions. Once Docker is installed, fork this repository, clone it locally, and enter its directory:
$ git clone [email protected]:your_github_username/pulse-data.git
$ cd pulse-dataBuild the image:
$ docker build -t recidiviz-image . --build-arg DEV_MODE=TrueStop and delete previous instances of the image if they exist:
$ docker stop recidiviz && docker rm recidivizRun a new instance, mounting the local working directory within the image:
$ docker run --name recidiviz -d -t -v $(pwd):/app recidiviz-imageOpen a bash shell within the instance:
$ docker exec -it recidiviz bashOnce in the instance's bash shell, update your pipenv environment:
$ pipenv sync --devTo activate your pipenv environment, run:
$ pipenv shellFinally, run pytest. If no tests fail, you are ready to develop!
Using this Docker container, you can edit your local repository files and use
git as usual within your local shell environment, but execute code and run
tests within the Docker container's shell environment. Depending on your IDE,
you may need to install additional plugins to allow running tests in the
container from the IDE.
Recidiviz interacts with Google Cloud services using
google-cloud-* Python client libraries.
During development, you may find it useful to verify the integration with these
services. First,
install the Google Cloud SDK, then
login to the SDK:
gcloud auth login --enable-gdrive-access --update-adc # Gets credentials to interact with services via the CLI
gcloud auth application-default login # Gets credentials which will be automatically read by our client librariesLastly, in a test script, use the
local_project_id_override helper
to override configuration used by our client library wrappers:
from recidiviz.utils.metadata import local_project_id_override
from recidiviz.utils.environment import GCP_PROJECT_STAGING
# Override configuration used by our client libraries
with local_project_id_override(GCP_PROJECT_STAGING):
# Google Cloud Client libraries will use `recidiviz-staging` in this contextNow the code run in the above context will interact directly with our staging services. Use conservatively & exercise caution!
Run the following to install Terraform:
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
To test your installation, run:
terraform -chdir=recidiviz/tools/deploy/terraform init -backend-config "bucket=recidiviz-staging-tf-state"
recidiviz/tools/deploy/terraform_plan.sh recidiviz-staging
If the above commands succeed, the installation was successful. For employees, see more information on running Terraform at go/terraform.
Docker (🐳 go/docker)
Docker is needed for deploying new versions of our applications.
Follow these instructions to install Docker on Linux:
Go to this page to download Docker Desktop for Mac and Windows.
Once installed, increase the memory available to Docker to ensure it has enough resources to build the container. On Docker Desktop, you can do this by going to Settings > Resources and increasing Memory to 4GB.
Recidiviz depends on sensitive information to run. This data is stored in Cloud
Datastore, which should be added manually to your production environment (see
utils/secrets for more information on the Datastore kind used).
Individual tests can be run via pytest filename.py. To run all tests, go to
the root directory and run pytest recidiviz.
The configuration in setup.cfg and .coveragerc will ensure the right code is
tested and the proper code coverage metrics are displayed.
A bug in the google client
requires that you have default application credentials. This should not be
necessary in the future. For now, make sure that you have done both
gcloud config set project recidiviz and
gcloud auth application-default login.
Run Pylint across the main body of code, in particular: pylint recidiviz.
The output will include individual lines for all style violations, followed by a handful of reports, and finally a general code score out of 10. Fix any new violations in your commit. If you believe there is cause for a rule change, e.g. if you believe a particular rule is inappropriate in the codebase, then submit that change as part of your inbound pull request.
We use black to ensure consistent formatting across the code base and isort
to sort imports. There is a pre-commit hook that will format all of your files
automatically. It is defined in githooks/pre-commit and is installed by
./initial_pipenv_setup.sh.
You can also set up your editor to run black and isort on save. See
the black docs
for how to configure external tools (both black and isort) to run in PyCharm
(more info in PyCQA/isort#258).
In VSCode just add the following to your .vscode/settings.json:
"editor.formatOnSave": true,
"python.formatting.provider": "black",
"[python.editor.codeActionsOnSave]": {
"source.organizeImports": true
},Run Mypy across all code to check for static type errors: mypy recidiviz.
We use bandit to check for static security errors within the recidiviz
folder. This is run in the CI. Adding # nosec to the effected line will ignore
false positive issues.
Install the GCloud SDK using the interactive installer.
Note: make sure the installer did not add
google-cloud-sdk/platform/google_appengine or subdirectories thereof to your
$PYTHONPATH, e.g. in your bash profile. This could break attempts to run tests
within the pipenv shell by hijacking certain dependencies.
Make sure you have docker installed (see instructions above), then configure docker authentication:
$ gcloud auth login
$ gcloud auth configure-dockerIf you see a pipenv error (either during install or sync) with the following:
An error occurred while installing psycopg2==...
On a Mac:
- Ensure
postgresqlandopensslare installed with:brew install postgresql openssl - Run the initial pipenv setup script:
./initial_pipenv_setup.sh
On Linux: Ensure libpq is installed with:
apt update -y && apt install -y libpq-dev