Skip to content

Integration of Department Metadata into EMAP for Efficient Location Filtering #113

@zmek

Description

@zmek

Integration of Department Metadata into EMAP for Efficient Inpatient Location Filtering

Background

We need department level data in EMAP. This is necessary so that we can efficiently identify patients who are occupying inpatient beds at a point in time, or patients in the ED at a point in time.

Useful metadata includes:

  • which site a department is on (Grafton Way, Queen's Square etc)
  • whether a department is a ward or flow area (ie relevant to those managing patient flow)
  • whether a department that is a ward or flow area is classified as an inpatient overnight location (from RPT_GRP_NINE field in Clarity's CLARITY_DEP table)

Current State

  • EMAP stores only location strings without department metadata
  • Current location filtering uses hardcoded list of ~500 location strings converted to location IDs
  • Performance bottleneck: large array filtering in SQL queries
  • Missing ~3,500 additional inpatient ward locations identified in Clarity analysis

Problem Statement

Need efficient method to filter EMAP data by ward/flow area classification for:

  1. Large-scale model training data extraction
  2. Real-time inpatient prediction applications

Current query pattern:

-- Slow: filtering by large array of location IDs
SELECT * FROM [Join of hospital_visit and location_visit] 
WHERE location_id = ANY(ARRAY[123, 456, 789, ...]) -- 500+ IDs

Desired query pattern:

-- Fast: filtering by metadata flags
SELECT * FROM emap_table e
WHERE e.is_ward_or_flow_area = true
  AND e.site_location_id IN ('10201', '10218')
  AND e.classification in ('2', '20') -- Inpatient - Overnight Ward or ICU location

Proposed Solutions

Option 1: Extend EMAP Hoover

Description: Modify the EMAP data ingestion pipeline to populate department metadata from Clarity during hoover process.

Implementation:

  • Add columns to EMAP location/department tables: rpt_grp_nine, is_ward_or_flow_area, site_location_id, etc.
  • Modify hoover to lookup metadata from Clarity during ingestion
  • Create indexes on metadata columns

Pros:

  • Best query performance (single table, no joins)
  • Self-contained EMAP database
  • Scales efficiently with location growth
  • Optimal for real-time applications

Cons:

  • Requires hoover development resources
  • Testing overhead for data pipeline changes
  • One-time implementation effort

Option 2: Cross-Database Views/Federated Queries

Description: Create views in EMAP that join to Clarity metadata tables using foreign data wrappers.

Implementation:

CREATE FOREIGN TABLE clarity_dept_metadata (...) 
SERVER clarity_server OPTIONS (...);

CREATE VIEW emap_with_metadata AS
SELECT e.*, cmd.is_ward_or_flow_area, cmd.site_location_id
FROM emap_locations e
JOIN clarity_dept_metadata cmd ON e.location_string = cmd.location_string;

Pros:

  • No changes to core EMAP pipeline
  • Always current metadata
  • Quick to implement

Cons:

  • Cross-database query performance issues
  • Network latency impact
  • System dependencies
  • Poor for real-time applications

Option 3: Periodic Metadata Sync Table

Description: Create lookup table in EMAP refreshed periodically from Clarity.

Implementation:

CREATE TABLE location_metadata (
    location_string VARCHAR(255) PRIMARY KEY,
    department_id VARCHAR(50),
    department_name VARCHAR(255),
    rpt_grp_nine VARCHAR(10),
    is_ward_or_flow_area BOOLEAN,
    site_location_id VARCHAR(20),
    last_updated TIMESTAMP
);

Pros:

  • Good query performance (local joins)
  • No hoover changes required
  • Quick implementation
  • Immediate performance gains

Cons:

  • Metadata staleness between syncs
  • Additional sync process maintenance
  • Scheduling/monitoring overhead

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions