Back to Proposal List

Improving Computational Methods for High-throughput Sequence Data Analysis


Projects minimap2, BWA, hifiasm
Lead

Heng Li (Dana-Farber Cancer Institute)

Funding Cycle 4

Proposal Summary

To maintain and improve the three proposed software projects: minimap2, BWA and hifiasm, and extend them to new architectures and new data types.


Project

minimap2

minimap2 is the dominant sequence aligner for long reads. It was optimized for long reads of 85–95 percent accuracy. Although minimap2 works with long accuracy produced today, it does not take full advantage of modern data. This work will improve the performance and accuracy for long accurate reads and for long sequence assemblies— in particular around long segmental duplications and in long repetitive regions.


Key Personnel

Heng Li
Haoyu Cheng

Project

BWA

With the release of BWA-MEM2, this work will continue to maintain and improve BWA, and add the ARM64 support for Apple M1 and recent ARM-based servers. In collaboration with the Intel research lab, the team will explore faster indexing algorithms to replace the current one in BWA-MEM2.


Key Personnel

Heng Li

Project

hifiasm

hifiasm has been rapidly adopted in the community and will likely have a significant impact in the next few years. At present, hifiasm only works with PacBio’s High-Fidelity (HiFi) long reads. With Oxford Nanopore’s new chemistry and new base-calling algorithm which can bring the average base accuracy to 99 percent, the team plans to adapt the hifiasm algorithm to Nanopore data. In addition to the support of Nanopore data, the team plans to integrate Hi-C sequence data into the assembly process, which can be achieved by mapping Hi-C reads to the hifiasm assembly graph and phase unitigs using Hi-C’s long-range information.


Key Personnel

Heng Li
Haoyu Cheng