Improving Computational Methods for High-throughput Sequence Data Analysis

Projects minimap2, BWA, hifiasm

Funding Cycle 4

Proposal Summary
Projects

Proposal Summary

To maintain and improve the three proposed software projects: minimap2, BWA and hifiasm, and extend them to new architectures and new data types.

Project

minimap2

minimap2 is the dominant sequence aligner for long reads. It was optimized for long reads of 85–95 percent accuracy. Although minimap2 works with long accuracy produced today, it does not take full advantage of modern data. This work will improve the performance and accuracy for long accurate reads and for long sequence assemblies— in particular around long segmental duplications and in long repetitive regions.

Key Personnel

Heng Li

Affiliation

Dana-Farber Cancer Institute

GitHub Handle

lh3

Haoyu Cheng

Affiliation

Dana-Farber Cancer Institute

GitHub Handle

chhylp123

Project

BWA

With the release of BWA-MEM2, this work will continue to maintain and improve BWA, and add the ARM64 support for Apple M1 and recent ARM-based servers. In collaboration with the Intel research lab, the team will explore faster indexing algorithms to replace the current one in BWA-MEM2.

Key Personnel

Heng Li

Affiliation

Dana-Farber Cancer Institute

GitHub Handle

lh3

Project

hifiasm

hifiasm has been rapidly adopted in the community and will likely have a significant impact in the next few years. At present, hifiasm only works with PacBio’s High-Fidelity (HiFi) long reads. With Oxford Nanopore’s new chemistry and new base-calling algorithm which can bring the average base accuracy to 99 percent, the team plans to adapt the hifiasm algorithm to Nanopore data. In addition to the support of Nanopore data, the team plans to integrate Hi-C sequence data into the assembly process, which can be achieved by mapping Hi-C reads to the hifiasm assembly graph and phase unitigs using Hi-C’s long-range information.

Key Personnel

Heng Li

Affiliation

Dana-Farber Cancer Institute

GitHub Handle

lh3

Haoyu Cheng

Affiliation

Dana-Farber Cancer Institute

GitHub Handle

chhylp123