Quantization and Compressive Learning Methods for Omics Data
To develop new omics data quantization and lossless compression algorithms and accompanying machine learning methods that are robust to quantization errors.
Results & Resources
The Hernaez lab, in collaboration with Idoia Ochoa, developed a number of genomic data compressors:
- SPRING, for FASTQ files, is able to reduce the size of compressed raw genomic data by a factor of three.
- ALICO, for SAM/BAM files, achieves 20% improvement over other tools while maintaining similar speeds.
- GABAC is a compression module to be plugged into existing compressors such as CRAM, SPRING, and others. This module compresses the uncompressed streams generated by other methods.
- GPress, for gene expression files, achieves a 98% reduction in file size while maintaining the ability to retrieve all annotations for a given identifier or a range of coordinates.