Skip to content

Proposal 5: Extend dataset type classes #42

@rralifia

Description

@rralifia

Issue: Gap 5 – Dataset type classes

New terms / entities

  • demographic data set (class)
  • social determinants of health data set (class)
  • lifestyle data set (class)
  • medical and family history data set (class)
  • imaging data set (class)

Type

  • Classes

Rationale

ICO imports OBI’s data set and related classes, but for consent-aware discovery we often need to know which kinds of data a study plans to retain or share (e.g., demographic, SDOH, lifestyle, medical/family history, imaging).

Consent forms commonly specify these in free text (“MRI images will be kept for future research.”, “We will collect demographic and medical history information.”). Having a small set of dataset-type classes allows us to:

  • annotate consent clauses that describe types of data to be collected, retained, or shared;
  • support filters such as “studies with imaging data” or “studies with SDOH and lifestyle data”.

All proposed classes are intended as subclasses of OBI’s data set.


Details

1. demographic data set

  • Label: demographic data set
  • Definition:
    A data set that primarily contains demographic information about research participants, such as age, sex, race, ethnicity, education, and basic socioeconomic variables.
  • Subclass of:
    data set
  • Example of usage:
    • Clause: “We will collect demographic information (age, sex, race, education).”
    • The resulting dataset is annotated as an instance of demographic data set.

2. social determinants of health data set

  • Label: social determinants of health data set
  • Definition:
    A data set that primarily contains social determinants of health (SDOH) information about participants, such as income, employment, housing stability, food security, transportation access, and social support.
  • Subclass of:
    data set
  • Example of usage:
    • Clause: “We will ask questions about your housing, employment, and financial situation.”
    • The corresponding dataset is a social determinants of health data set.

3. lifestyle data set

  • Label: lifestyle data set
  • Definition:
    A data set that primarily contains clinically relevant lifestyle information about participants, such as smoking, alcohol or substance use, diet, and physical activity, consistent with ICO’s clinically relevant lifestyle inclusion criterion pattern.
  • Subclass of:
    data set
  • Example of usage:
    • Clause: “We will collect information about your smoking, alcohol use, and exercise habits.”
    • The corresponding dataset is a lifestyle data set.

4. medical and family history data set

  • Label: medical and family history data set
  • Definition:
    A data set that primarily contains participants’ medical history (e.g., diagnoses, procedures, medications, allergies) and family history of disease.
  • Subclass of:
    data set
  • Example of usage:
    • Clause: “We will collect your medical history and family history of heart disease.”
    • The corresponding dataset is a medical and family history data set.

5. imaging data set

  • Label: imaging data set
  • Definition:
    A data set that primarily contains imaging data (e.g., MRI, CT, X-ray, ultrasound) or derived image-based features about participants.
  • Subclass of:
    data set
  • Example of usage:
    • Clause (Study NCT06703541):

      MRI images will be kept for future research.

    • The retained images are annotated as an imaging data set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions