Back to Project List

Scalable Interactive Analysis of Single-Cell Data with Apache Spark

Focus Scale

Project Goal

To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.

Results & Resources

The Laserson group made contributions to existing open source projects, such as Zarr, Scanpy and PyNNDescent. They also developed a number of new projects:

  • Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.
  • ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.
  • scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.
  • Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.


Lead Investigator

Uri Laserson
Uri Laserson