Skip to content

Support for genotype data in VCF format #45

@nevrome

Description

@nevrome

The VCF file format appears to be a popular, powerful and (comparatively) well specified file format for genotype data. Poseidon could (one day!) support it the same way it supports Packed PLINK and EIGENSTRAT data. Some observations:

  • VCF files seem to be very flexible and capable of storing a lot more information than PLINK or EIGENSTRAT files. That makes them harder to parse and render. Most importantly there is no lossless conversion between the formats, given VCF's greater flexibility.
  • The VCF file definition seems to be adjusted relatively frequently. v.4.3 is published, v.4.4 on the way. For Poseidon we would have to decide which version we support and keep track of the changes in the format.
  • For poseidon-hs: sequence-formats already supports it (at least partially?). In case of missing functionality here, also this script or this package may serve as an inspiration.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions