Shasta: A De Novo Genome Assembler for Long-Read DNA Sequencing Technology

An abstract representation of DNA.
An abstract representation of DNA.

Traditionally, genomics research has relied exclusively on the reference genome from a small group of individuals to represent an entire species. In 2017, researchers at the University of California, Santa Cruz (UCSC) demonstrated that long-read human genome assembly using nanopore technology was possible without using a reference genome — but took hundreds of thousands of compute hours to complete. A year later, the group reached a then-unprecedented milestone of reference-free (de novo) sequencing 11 human genomes in nine days

To help advance and scale nanopore sequencing, an extensive team of researchers and developers led by Paolo Carnevali at CZI and Benedict Paten at UCSC built Shasta, an in-memory computing-driven algorithm that can complete a de novo (new, never before processed and completed without a prior reference genome) human genome assembly in just a few hours.

While the human genome remains Shasta’s primary focus, Shasta is also being used for the de novo sequencing of many other organisms such as terrestrial and marine invertebrates, plants, and microorganisms, among others.

Developed in partnership with researchers and developers from the UC Santa Cruz Genomics Institute, Shasta gives researchers vital insights into the human genome in a fraction of the time and cost of traditional methods. This paper in Nature Biotechnology details how Shasta not only yields comparable or better accuracy as other similar assemblers, but also has the lowest number of misassemblies, leading to critical breakthroughs in genomics research.

Oct 26, 2020