Scale and accelerate science using AI on CZI’s high-performance computing cluster.

Close Banner

Virtual Cells

We’re building AI-powered virtual cells to help scientists explore the molecular underpinnings of human health and disease.

We’re leveraging AI to build virtual cells that are capable of predicting the behavior of healthy and diseased cells, which will have broad applications for biomedical research, disease diagnosis and therapeutic development. We believe this effort will deepen our understanding of human biology at a molecular level — bringing scientists closer to curing, preventing or managing all diseases by the end of this century.

To enable this work, we’re building a platform where biologists and machine learning researchers can more easily access curated, large AI models of cell biology and the datasets they were built on. This lowers the barriers for biologists to use AI for specific biological tasks and allows ML/AI researchers to rapidly iterate and improve the quality, utility, and performance of models, which will in turn speed up the process of science.

We’re prototyping in the open and the platform is now available in early access for the scientific community.

Visit the Platform
A black computer illustration with a pink cell on the screen.

Our Approach

Biological Data

Building virtual cells will require vast amounts of diverse and multimodal biological data. CZI supports these datasets by building open source software tools like Chan Zuckerberg CELLxGENE (CZ CELLxGENE) to make machine-learning-ready, high-quality data more accessible for scientific research, funding single-cell data generation, and creating scientific institutes that also generate data to advance cell biology.

Models & Applications

Our science technology team will partner with leading AI experts and academic researchers to build models that will help unlock the mysteries of cells and how cells interact within systems. These models and their outputs will be openly accessible to the scientific community through a centralized platform.

Compute Infrastructure

Training AI on enormous amounts of biological data will require a high-powered computing system. We’re building and funding one of the largest computing systems for nonprofit life science research, which will power the next generation of AI modeling for cell biology.

Collaboration

Enabling AI at scale for research will require close collaboration with the scientific community. With our network of grantees and collaborative research institutes, we have a history of bringing together experts across disciplines to pursue some of the toughest, riskiest scientific challenges that can’t be done elsewhere. We’re also committed to making data, models and applications open source for research.

Initial Research Areas

A cluster of multi-colored spots forming a misshapen circle.
Visual representation of universal cell embeddings. Photograph courtesy of Leskovec Lab

scGenePT

Model Developer: Chan Zuckerberg Initiative

scGenePT is a collection of single-cell models for perturbation prediction. It leverages the scGPT foundation model for scRNAseq data by injecting language embeddings at the gene level into the model architecture. The language gene embeddings are obtained by embedding gene-level information from different knowledge sources using LLMs. The knowledge sources used include NCBI gene descriptions, UniProt protein summaries for protein coding genes — as inspired by the genePT approach — and GO (Gene Ontology) Gene Molecular Annotations, across three different axes: molecular function, biological process and cellular component.

An irregular red and green shape with blue ovals throughout, taking up most of a black background.
Visual representation of protein expressions over pre-existing reference markers. | Photograph courtesy of Emma Lundberg, Human Protein Atlas

SubCell

Model Developer: Ankit Gupta, The Lundberg Lab (Stanford University) and Chan Zuckerberg Initiative

The SubCell models are Vision Transformer (ViT) models pretrained on the single-cell images of the Human Protein Atlas dataset containing protein expression and spatiotemporal distribution of more than 13,000 genes in 37 cell lines. The models generate feature embeddings that encode the protein localization patterns in the immunofluorescence images and can be used in downstream tasks such as protein localization classification or morphology-based profiling of the cells.

A group of individuals standing together at an office. Most wear red lanyards and an art installation is in the background.
AI Residents at CZI | Photograph courtesy of Donghui Li
AI Residents at CZI | Photograph courtesy of Donghui Li

Building Virtual Cells With the Scientific Community

We’re introducing a new AI residency program to coalesce AI/machine-learning leaders from the academic research community. In collaboration with CZI scientists and engineers, the projects spearheaded by these residents will lay the foundation for virtual cell models that will allow researchers to explore the molecular underpinnings of human health and disease.

We’re launching the first phase of the AI residency program with current collaborators and partners. Join our mailing list to stay updated on the latest virtual cell news, collaboration opportunities and more.