Scalable Interactive Analysis of Single-Cell Data with Apache Spark
To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.
Results & Resources
- Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.
- ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.
- scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.
- Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.