Quantization and Compressive Learning Methods for Omics Data
To develop new omics data quantization and lossless compression algorithms and accompanying machine learning methods that are robust to quantization errors.
Results & Resources
The Ochoa lab, in collaboration with Mikel Hernaez, developed a number of genomic data compressors:
- SPRING, for FASTQ files, is able to reduce the size of compressed raw genomic data by a factor of three.
- ALICO, for SAM/BAM files, achieves 20% improvement over other tools while maintaining similar speeds.
- GPress, for gene expression files, achieves a 98% reduction in file size while maintaining the ability to retrieve all annotations for a given identifier or a range of coordinates.
- MassComp is a compressor for mass spectrometry data from proteomics and metabolomics studies.
- DeepZip, a general compressor for any type of data, is based on neural networks and when applied to genomic data, achieves a comparable performance to that of specialized compressors.