Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.
Results & Resources
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:
- ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
- Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
- Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.