I’m always energized when I meet with teams across CZI to review our work as an organization. Earlier this summer, we held strategy reviews to reflect on the lessons we’ve learned and to discuss how we can build on our progress. These conversations made it clear that the last several years have prepared us to leverage new, exciting technologies like artificial intelligence (AI) to accelerate progress across our core focus areas.
In the scientific field specifically, researchers are already using AI to turn decades of scientific research into breakthroughs about the biomolecules — like proteins — that keep our bodies running. And with large language models and machine-learning technologies advancing so quickly, AI can help us do so much more.
AI can assist us in analyzing the various types of cells in our body and how they interact — which we believe is key to demystifying disease and helping scientists cure, prevent or manage all diseases by the end of this century.
Before that can happen, researchers need to shift from using AI to create predictive models of molecules to using it to create predictive models of cells. It’s like transitioning from tracking the orbit of a single planet to predicting the interactions of an entire solar system. We’ve helped the science community reach the point where that shift is possible.
We’re building one of the largest computing clusters in the world for nonprofit life science research to create a “virtual cell.” It would give scientists worldwide access to digital models that could predict the behavior of any cell type and how it may respond to different conditions. For example, researchers could predict how an immune cell will react to an infection or what’s happening at the cellular level when a child is born with a rare disease.
Watch our video below to hear how we’re accelerating progress on important scientific questions about how our cells work.
Building foundational datasets to train AI models for predicting cell behavior
For the past seven years, across our science and technology teams, our scientific institutes, and our grantees, we’ve worked closely with researchers in the field to build the large, robust and integrated datasets required to train AI models for studying cells. Together, we’ve also created the open-source tools scientists use to draw insights from those datasets.
We’ve been laying the groundwork to leverage large-scale datasets for AI to predict cell types and states from the genome, starting with the Human Cell Atlas. Since 2017, we’ve been supporting researchers to build this baseline reference map of more than 37 trillion cells in the human body — including where they’re located and what they’re like when they’re healthy. Now, with recent advances in AI and machine learning, researchers have been developing new approaches and methods to analyze the data they’ve been aggregating for years — which will help accelerate discoveries about human health and disease.
Even when scientists generate rich data, the datasets need to be annotated, standardized and linked before they can accurately draw conclusions about different cells and their unique behavior. This is why CZI developed CZ CELLxGENE, an open-source tool for exploring and annotating single-cell datasets. The platform and its features make single-cell data more accessible for scientists, so they can more quickly surface important information that could lead to discoveries in treating disease.
We also founded the Chan Zuckerberg Biohub Network to give researchers the long-term runway and resources to tackle ambitious scientific challenges. These scientific institutes are generating large amounts of data to feed into new digital cell models. For example, scientists at the Chan Zuckerberg Imaging Institute are partnering with colleagues at the CZ Biohub San Francisco to build on the OpenCell atlas, which is mapping the locations of crucial proteins in our cells and their interactions with one another.
Providing computing power dedicated to life science research
As we saw researchers beginning to make discoveries using the datasets and tools we’ve created, in tandem with recent advances in AI, it became clear that our unique approach as a philanthropy can help accelerate scientific progress. We’re building a powerful computing cluster that will be an important resource for academic teams to develop large-scale AI models for biology.
The computing cluster is powerful enough to train AI models on all the rich data we’ve gathered, existing publicly available datasets, and the new data we and others in the field will generate. In the coming years, we hope to develop new AI models to help scientists predict every cell type— in both healthy and diseased states — and help clinicians and patients understand what we can do to keep them healthy. Mark and I recently shared more about this effort with MIT Technology Review.
We’re incredibly excited about the new possibilities we can unlock when using AI for biology at scale — helping scientists make more discoveries (and faster) toward curing, preventing or managing all diseases.