Scalable Storage of Tensor Data for Scientific Computing
Ryan Williams (Mount Sinai School of Medicine)
To establish Zarr as a foundation for scientific data storage, with clear data format and protocol specifications, implementations in multiple programming languages, and a community process for evolving to support new scientific applications.
Zarr is a spec for storing chunked, compressed, N-dimensional arrays; several languages’ implementations of that spec, and an associated ecosystem of tools and integrations that use them. It is broadly used in biomedicine (malaria genomics, scRNAseq, spatial transcriptomics, neuroscience, etc.) and beyond. It fills a need for simple, scalable N-dimensional array storage in the cloud era. Scientists from many disciplines use it to solve a need historically filled by HDF5. It has a vibrant open ecosystem and distributed, grassroots developer and user base, and supports the NetCDF data model, allowing drop-in integration with a wide variety of systems and use cases.