Are massive zarr attributes a problem? #365

nebfield · 2025-11-06T13:41:46Z

nebfield
Nov 6, 2025

Firstly, thank you for your work building zarr!

I plan to use the zarr arrays as a cache that I can use to calculate polygenic scores. Over time, I'd append elements to the zarr array and update my attributes.

I'd like to store a bunch metadata about genetic variants in zarr attributes, alongside my genotype arrays. The metadata I'm storing is basically a dataframe.

I'm a bit concerned the size of my JSON files might blow up in size over time if it contains tens of millions of variants (corresponding to rows in the genotype array).

Is this a common problem in the community? I was thinking of zipping the local directory store after my cache is ready for calculation. Or are there other approaches people adopt?

Answered by dcherian

Nov 6, 2025

why not store your data frame as a bunch of 1D arrays (one per column)? If the columns are of the same data type, then it can be one array.

View full answer

dcherian · 2025-11-06T15:50:47Z

dcherian
Nov 6, 2025

why not store your data frame as a bunch of 1D arrays (one per column)? If the columns are of the same data type, then it can be one array.

2 replies

nebfield Nov 6, 2025
Author

that's an excellent idea 🤩 thank you!

rabernat Nov 6, 2025
Maintainer

Plus then you can query and filter with zarr-datafusion-search (experimental prototype).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Are massive zarr attributes a problem? #365

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Are massive zarr attributes a problem? #365

Uh oh!

Uh oh!

nebfield Nov 6, 2025

Replies: 1 comment · 2 replies

Uh oh!

dcherian Nov 6, 2025

Uh oh!

nebfield Nov 6, 2025 Author

Uh oh!

rabernat Nov 6, 2025 Maintainer

nebfield
Nov 6, 2025

Replies: 1 comment 2 replies

dcherian
Nov 6, 2025

nebfield Nov 6, 2025
Author

rabernat Nov 6, 2025
Maintainer