Building virtual cells will require vast amounts of diverse and multimodal biological data. CZI supports these datasets by building open source software tools like Chan Zuckerberg CELLxGENE (CZ CELLxGENE) to make machine-learning-ready, high-quality data more accessible for scientific research, funding single-cell data generation, and creating scientific institutes that also generate data to advance cell biology.
Virtual Cells
We’re building AI-powered virtual cells to help scientists explore the molecular underpinnings of human health and disease.
We’re leveraging AI to build virtual cells that are capable of predicting the behavior of healthy and diseased cells, which will have broad applications for biomedical research, disease diagnosis and therapeutic development. We believe this effort will deepen our understanding of human biology at a molecular level — bringing scientists closer to curing, preventing or managing all diseases by the end of this century.
To enable this work, we’re building a platform where biologists and machine learning researchers can more easily access curated, large AI models of cell biology and the datasets they were built on. This lowers the barriers for biologists to use AI for specific biological tasks and allows ML/AI researchers to rapidly iterate and improve the quality, utility, and performance of models, which will in turn speed up the process of science.
We’re prototyping in the open and the platform is now available in early access for the scientific community.
Our Approach
Biological Data
Models & Applications
Our science technology team will partner with leading AI experts and academic researchers to build models that will help unlock the mysteries of cells and how cells interact within systems. These models and their outputs will be openly accessible to the scientific community through a centralized platform.
Compute Infrastructure
Training AI on enormous amounts of biological data will require a high-powered computing system. We’re building and funding one of the largest computing systems for nonprofit life science research, which will power the next generation of AI modeling for cell biology.
Collaboration
Enabling AI at scale for research will require close collaboration with the scientific community. With our network of grantees and collaborative research institutes, we have a history of bringing together experts across disciplines to pursue some of the toughest, riskiest scientific challenges that can’t be done elsewhere. We’re also committed to making data, models and applications open source for research.
Initial Research Areas
scGenePT
Model Developer: Chan Zuckerberg Initiative
scGenePT is a collection of single-cell models for perturbation prediction. It leverages the scGPT foundation model for scRNAseq data by injecting language embeddings at the gene level into the model architecture. The language gene embeddings are obtained by embedding gene-level information from different knowledge sources using LLMs. The knowledge sources used include NCBI gene descriptions, UniProt protein summaries for protein coding genes — as inspired by the genePT approach — and GO (Gene Ontology) Gene Molecular Annotations, across three different axes: molecular function, biological process and cellular component.
SubCell
Model Developer: Ankit Gupta, The Lundberg Lab (Stanford University) and Chan Zuckerberg Initiative
The SubCell models are Vision Transformer (ViT) models pretrained on the single-cell images of the Human Protein Atlas dataset containing protein expression and spatiotemporal distribution of more than 13,000 genes in 37 cell lines. The models generate feature embeddings that encode the protein localization patterns in the immunofluorescence images and can be used in downstream tasks such as protein localization classification or morphology-based profiling of the cells.
Join our mailing list to stay updated on the latest virtual cell news, collaboration opportunities, and more. You can unsubscribe at any time.
Sorry, marketing cookies are required to view this form.
Building Virtual Cells With the Scientific Community
We’re introducing a new AI residency program to coalesce AI/machine-learning leaders from the academic research community. In collaboration with CZI scientists and engineers, the projects spearheaded by these residents will lay the foundation for virtual cell models that will allow researchers to explore the molecular underpinnings of human health and disease.
We’re launching the first phase of the AI residency program with current collaborators and partners. Join our mailing list to stay updated on the latest virtual cell news, collaboration opportunities and more.
News & Stories
Interested in learning more about our work in AI? Get the latest information from the links below.