Back to Project List

Quantization and Compressive Learning Methods for Omics Data


Focus Compression

Project Goal

To develop new omics data quantization and lossless compression algorithms and accompanying machine learning methods that are robust to quantization errors.


Results & Resources

The Hernaez lab, in collaboration with Idoia Ochoa, developed a number of genomic data compressors:

  • SPRING, for FASTQ files, is able to reduce the size of compressed raw genomic data by a factor of three.
  • ALICO, for SAM/BAM files, achieves 20% improvement over other tools while maintaining similar speeds.
  • GABAC is a compression module to be plugged into existing compressors such as CRAM, SPRING, and others. This module compresses the uncompressed streams generated by other methods.
  • GPress, for gene expression files, achieves a 98% reduction in file size while maintaining the ability to retrieve all annotations for a given identifier or a range of coordinates.

Investigators

Lead Investigator

Mikel Hernaez
Mikel Hernaez