Mar 16, 2020 · 3 min read
New Dataset Makes Coronavirus Research Open and Machine Readable
Partnership Between Government, Academia, Medicine, and Technology Creates Hub for Coronavirus-Related Research Literature
Editor’s Note: This release has been updated as of 11/2/21.
The Chan Zuckerberg Initiative collaborated with leaders in government, academia, medicine, and technology to create a new, open dataset containing research literature related to the coronavirus. The COVID-19 Open Research Dataset (CORD-19) released today will be available on multiple platforms, and will continue to be updated as new research emerges.
“Sharing vital information across scientific and medical communities is key to accelerating our ability to respond to the coronavirus pandemic,” said CZI Head of Science, Cori Bargmann. “The new COVID-19 Open Research Dataset will help researchers worldwide to access important information faster.”
Researchers and leaders from CZI, the Allen Institute for Artificial Intelligence, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, Cold Spring Harbor Laboratory, and the National Library of Medicine of the National Institutes of Health have worked to prepare and distribute a set of research literature about COVID-19, SARS-CoV-2, and the coronavirus group.
The more than 29,000 articles — over 13,000 of which are full-text — contain a wealth of information about the novel coronavirus and related viruses. The content will continue to be updated as new insights are published in peer-reviewed publications and in archival services, such as the preprint servers bioRxiv (a CZI grantee), medRxiv, and others.
“Preprint services such as bioRxiv and medRxiv are key resources that allow researchers to quickly and openly share their findings globally,” said Sage Bionetworks Chief Commons Officer, John Wilbanks. “The joint efforts of the Chan Zuckerberg Initiative and the Cold Spring Harbor Laboratories to compile these preprints into the COVID-19 Open Research Dataset is critical to ensure the analysis of the literature includes preprints immediately shared by biomedical researchers.”
With these machine-readable resources accessible and available for data analysis, the worldwide machine learning community has the opportunity to apply recent advances in natural language processing to find answers to questions within, and connect insights across, this content in support of the ongoing fight against this infectious disease.
The COVID-19 Open Research Dataset (CORD-19) will be linked to the World Health Organization database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph (COVID-19 resource page), Dimensions (COVID-19 resource page), PubMed, and Semantic Scholar services. This research may also be tracked by researchers and the public via the COVID-19 feed on CZI’s free research discovery tool Meta.
For more information about CORD-19, read the White House Office of Science and Technology Policy press release. For more information about the novel coronavirus and COVID-19, please visit https://www.cdc.gov/coronavirus.
For more information about how CZI and our grant partners are responding to COVID-19, visit https://chanzuckerberg.com/covid-19/.
About the Chan Zuckerberg Initiative
Founded by Dr. Priscilla Chan and Mark Zuckerberg in 2015, the Chan Zuckerberg Initiative (CZI) is a new kind of philanthropy that’s leveraging technology to help solve some of the world’s toughest challenges — from eradicating disease, to improving education, to reforming the criminal justice system. Across three core Initiative focus areas of Science, Education, and Justice & Opportunity, we’re pairing engineering with grant-making, impact investing, and policy and advocacy work to help build an inclusive, just and healthy future for everyone. For more information, please visit chanzuckerberg.com.