Skip to content

SAFEHR-data/omop_es

Repository files navigation

OMOP Extraction System

Warning

This repository is a one-off artifact for public inspection only. All electronic health record intellectual property has been removed to allow for public release. It is expected that the tests and code here will not function as-is.
For access to the unredacted full pipeline, please contact [email protected].
We plan to release a scalable, open-source framework in the future.

Introduction and Goals

omop_es has been developed to provide OMOP extracts of NHS data to researchers. Initially focused on UCLH, it is growing to support multiple trusts.

omop_es aims to:

  • be configurable to a variety of OMOP projects
  • be reproducible and automated
  • automate end-to-end the transform of data to OMOP
  • integrate and link across data sources
  • automate filtering of sensitive data via configuration, with safe defaults and the ability for IG to inspect the configuration.

In addition, the following non-functional requirements are significant:

  • The system should allow testing and reporting of mapping quality
  • The OMOP code should not contain core logic or mappings relevant to other use-cases. This should be in the source systems (data warehouse).
  • The system should be capable of being extended both by the 'core' team and by consumers in an inner source fashion.

High Level Design

high_level_design

The system is composed of multiple modules

  • Settings: Defines per-project configuration and project object for each site
  • Mapping: Maps underlying data source(s) to the latest OMOP at full fidelity
  • Linking: Links surrogate Ids to OMOP Ids generically with the ability to customise
  • Projection: Applies rules to filter the combined data for sensitivity. Also downgrades OMOP versions as needed
  • Output: Writes the projected output to either CSV, Parquet or SQLite formats for direct use / import in a TRE
  • Metadata:
  • Tests: Tests end-to-end pipeline plus sub-modules on mock data
  • Checks: Checks the resulting OMOP. Note that this should be portable to other ETL implementations. (not priority)
  • Units: A (loosely coupled) service to convert local units to UCUM and OMOP (UCUM). Drug quantity conversion is also implemented here

Notes about design and implementation can be found in relevant sub-folders

Instructions to get started (with UCLH mock data)

1. Clone private repository from Github into an RStudio project

1.1 RStudio, File, New Project 1.2 From Version Control 1.3 Git 1.4 Copy into first box https://github.com/uclh-criu/omop_es 1.5 Browse to location to save (on DSD A:/Documents/) 1.6 Create Project

(may take a while to copy files)

2. Open libraries.R in RStudio

2.1 Click Install if RStudio gives message in yellow banner : 'Package required but not installed' 2.2 Click Source

3. Download omop metadata needed

source(here::here("omop_metadata","download_omop_metadata.R"))

4. Create mock database

From a clean R environment.

source(here::here("source_access/UCLH/mock_database","recreate_mockdb.R"))

5. Run main/example.R

6. Set UCLH database access

From R console :

shell.exec(here("source_access/UCLH","odbc_setup.bat"))

7. Run an extraction for a project

source(here::here("omop_es_pipeline.R"))
#om <- omop_es_pipeline("[project_name]")
#replace [project_name] e.g.
om <- omop_es_pipeline("hic_myeloma_external")

8. Install pre-commit

Install and configure pre-commit to have your work automatically checked for lint errors and consistent code style.

About

Redacted multi-trust OMOP Extraction System

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 11

Languages