Back to Project List

Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark


Focus Scale

Project Goal

To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.


Results & Resources

The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:

  • ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
  • Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
  • Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.


Investigators

Lead Investigator

Anthony Joseph
Anthony Joseph