Back to Proposal List

Scalable Storage of Tensor Data for Scientific Computing


Project Zarr
Lead

Ryan Williams (Mount Sinai School of Medicine)

Funding Cycle 1

Proposal Summary

To establish Zarr as a foundation for scientific data storage, with clear data format and protocol specifications, implementations in multiple programming languages, and a community process for evolving to support new scientific applications.


Project

Zarr

Zarr is a spec for storing chunked, compressed, N-dimensional arrays; several languages’ implementations of that spec, and an associated ecosystem of tools and integrations that use them. It is broadly used in biomedicine (malaria genomics, scRNAseq, spatial transcriptomics, neuroscience, etc.) and beyond. It fills a need for simple, scalable N-dimensional array storage in the cloud era. Scientists from many disciplines use it to solve a need historically filled by HDF5. It has a vibrant open ecosystem and distributed, grassroots developer and user base, and supports the NetCDF data model, allowing drop-in integration with a wide variety of systems and use cases.


Key Personnel

Ryan Williams
Alistair Miles
John Kirkham
Josh Moore