-
|
Firstly, thank you for your work building zarr! I plan to use the zarr arrays as a cache that I can use to calculate polygenic scores. Over time, I'd append elements to the zarr array and update my attributes. I'd like to store a bunch metadata about genetic variants in zarr attributes, alongside my genotype arrays. The metadata I'm storing is basically a dataframe. I'm a bit concerned the size of my JSON files might blow up in size over time if it contains tens of millions of variants (corresponding to rows in the genotype array). Is this a common problem in the community? I was thinking of zipping the local directory store after my cache is ready for calculation. Or are there other approaches people adopt? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
why not store your data frame as a bunch of 1D arrays (one per column)? If the columns are of the same data type, then it can be one array. |
Beta Was this translation helpful? Give feedback.
why not store your data frame as a bunch of 1D arrays (one per column)? If the columns are of the same data type, then it can be one array.