Xarray: Multidimensional Labeled Arrays and Datasets in Python
Joseph Hamman (NumFOCUS)
To grow the use of Xarray in the biosciences as a foundational data model and computational toolkit for multidimensional labeled arrays.
Xarray is an open source Python project that makes working with labelled multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets are more than raw numbers; they have labels which describe how array values map to locations in dimensions such as space and time; metadata that describes how the data was collected and processed etc; and are often a collection of multiple fields on a common grid. Xarray embraces this complexity and provides tools for users to easily analyze, manipulate and visualize data using labels as well as preserve important metadata attributes. Xarray combines an expressive API inspired by Pandas with Unidata’s Common Data Model for self-described scientific data. Originally developed for analyzing multi-dimensional climate and weather data, the reach of Xarray now extends across a broad swathe of scientific domains. Xarray is particularly well-suited for analyzing the large and heterogeneous biomedical datasets produced by a wide range of experimental research tools. Existing applications of Xarray in the biomedical sciences are found from bioimaging to single-cell genomics.