Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
Focus
Scale
Project Goal
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.
Results & Resources
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:
- ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
- Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
- Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.
Investigators
Lead Investigator
Anthony Joseph