-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
data releases
use case
As an engineer, I need to know the sources, provenance and locations of all data in a predictable manner. I need to store all of the above in a cold storage archive. It should be discoverable, identify all relative and then know how to parse and load it into an active database.
MUSTS
- all data stored in ndjson with homegeneous record type per file
- all files are named with a pattern *.OBJECT_LABEL.ndjson.gz
- there will be a manifest file in the same directory manifest.yaml
- File listing including:
- MD5
- File listing including:
- stored in file, web directory or s3
SHOULDS
* File listing including:
* provenance meta data see https://github.com/DLR-SC/gitlab2prov
EXAMPLE
├── README.md
├── file-Patient.ndjson.gz
├── file-Specimen.ndjson.gz
├── file-Task.ndjson.gz
└── sub-dir
├── file-DocumentReference.ndjson.gz
├── file-Observation.ndjson.gz
└── file-Compound.ndjson.gz
"An iceberg's calf"
Would have an manifest.yaml
id: unique
name:
author: email
version: semantic
related-to:
tags: []
schema:
- url: http://some-publically-readable-url
# embedded copy
data: {}
source:
# all files extracted from this source
- url:
# with this provenance
- provenance: {}
code:
# all files created with this provenance
- provenance: {}
files:
- name: file-Patient.ndjson.gz
md5: XXXX
# except this one
code_provenance: {}
source_provenance: {}
- name: file-Specimen.ndjson.gz
md5: XXXX
tags: []
- name: file-Patient.ndjson.gz
md5: XXXX
tags: []
- name: sub-dir/file-DocumentReference.ndjson.gz
md5: XXXX
tags: []
- name: sub-dir/file-Observation.ndjson.gz
md5: XXXX
tags: []
- name: sub-dir/file-Compound.ndjson.gz
md5: XXXX
tags: []
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels