Warning
This repository is a one-off artifact for public inspection only. All electronic health record intellectual property has been removed to allow for public release. It is expected that the tests and code here will not function as-is.
For access to the unredacted full pipeline, please contact [email protected].
We plan to release a scalable, open-source framework in the future.
omop_es has been developed to provide OMOP extracts of NHS data to researchers. Initially focused on UCLH, it is growing to support multiple trusts.
omop_es aims to:
- be configurable to a variety of OMOP projects
- be reproducible and automated
- automate end-to-end the transform of data to OMOP
- integrate and link across data sources
- automate filtering of sensitive data via configuration, with safe defaults and the ability for IG to inspect the configuration.
In addition, the following non-functional requirements are significant:
- The system should allow testing and reporting of mapping quality
- The OMOP code should not contain core logic or mappings relevant to other use-cases. This should be in the source systems (data warehouse).
- The system should be capable of being extended both by the 'core' team and by consumers in an inner source fashion.
The system is composed of multiple modules
- Settings: Defines per-project configuration and project object for each site
- Mapping: Maps underlying data source(s) to the latest OMOP at full fidelity
- Linking: Links surrogate Ids to OMOP Ids generically with the ability to customise
- Projection: Applies rules to filter the combined data for sensitivity. Also downgrades OMOP versions as needed
- Output: Writes the projected output to either CSV, Parquet or SQLite formats for direct use / import in a TRE
- Metadata:
- Vocabulary: Download vocabularies from Athena
- CDM metadata: Provides metadata about OMOP tables. These are used to drive generic data processing.
- Tests: Tests end-to-end pipeline plus sub-modules on mock data
- Checks: Checks the resulting OMOP. Note that this should be portable to other ETL implementations. (not priority)
- Units: A (loosely coupled) service to convert local units to UCUM and OMOP (UCUM). Drug quantity conversion is also implemented here
Notes about design and implementation can be found in relevant sub-folders
1.1 RStudio, File, New Project 1.2 From Version Control 1.3 Git 1.4 Copy
into first box https://github.com/uclh-criu/omop_es 1.5 Browse to
location to save (on DSD A:/Documents/) 1.6 Create Project
(may take a while to copy files)
2.1 Click Install if RStudio gives message in yellow banner : 'Package required but not installed' 2.2 Click Source
source(here::here("omop_metadata","download_omop_metadata.R"))
From a clean R environment.
source(here::here("source_access/UCLH/mock_database","recreate_mockdb.R"))
From R console :
shell.exec(here("source_access/UCLH","odbc_setup.bat"))
source(here::here("omop_es_pipeline.R"))
#om <- omop_es_pipeline("[project_name]")
#replace [project_name] e.g.
om <- omop_es_pipeline("hic_myeloma_external")
Install and configure pre-commit to have your
work automatically checked for lint errors and consistent code style.