Scalable Interactive Analysis of Single-Cell Data with Apache Spark
Focus
Scale
Project Goal
To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.
Results & Resources
The Laserson group made contributions to existing open source projects, such as Zarr, Scanpy and PyNNDescent. They also developed a number of new projects:
- Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.
- ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.
- scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.
- Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.
Investigators
Lead Investigator
Uri Laserson