Beyond Data Repositories: Decentralized Analysis of the Human Cell Atlas
The proliferation of single-cell omics data has spurred the development of curated reference atlases that can potentially facilitate deeper analyses of newly generated (query) samples in both homeostatic and perturbed conditions. Ongoing efforts aim to collate reference datasets into repositories; however, these repositories are often limited to simple exploration or query cell type annotation. While recent methods like scArches provide ideas on query mapping, more advanced analyses are limited due to the lack of standardized access to the processed data beyond the expression matrix. A standardized, efficient, and easy-to-use platform is therefore needed to provide access to reference datasets and transfer knowledge embedded in them.
To address this need, this project will build a new software package, screfpy, which aggregates tools for reference-driven analyses of query data while depending on a model-based representation of the reference data (e.g., from a variational autoencoder). With this approach, screfpy will allow for leveraging massive references in novel analyses using only consumer-grade hardware. To accomplish this, screfpy will draw upon the team’s previous work on scvi-tools; AnnData, scArches, sfaira; and scanpy, and cater to both model generators (atlas builders) and model consumers who query the generated models. This project will consist of developing backend components, providing APIs to standardize the training and sharing of atlas-level reference models; an initial model repository consisting of a selection of tissue atlases; and frontend components to run through the query data analysis pipeline, featuring both programmatic access to the repository and a shareable GUI.