Enabling Differential Analyses of Genomic Data with limma, edgeR and Glimma
Gordon Smyth (Walter and Eliza Hall Institute of Medical Research)
To improve ease of use and interoperability of these packages, make methodological responses to new data challenges, refresh the documentation and structure of these packages, and prepare training materials.
limma is an R package for analyzing gene expression data from modern genomic technologies such as microarrays or RNA sequencing. Its central aim is to detect genes that have changed expression levels between experimental conditions or cell types. It can also test for changes in the structure of each gene in the form of differential exon splicing. limma can analyze arbitrarily complex experiments with multiple experimental factors. A unique feature of limma is its ability to borrow information between genes, increasing the power and robustness with which conclusions can be reached for each individual gene. This feature allows it to conduct robust statistical analyses even when the number of biological samples is very small. All limma functions support the concept of precision weights. limma can identify and down weight lower quality samples and can estimate correlations between closely related samples from the same biological source. limma includes methods for interpreting differential expression results in terms of higher-order molecular pathways or biological processes. limma provides a range of plotting functions for exploring data quality or for presenting and interpreting results. At a lower level, limma includes rich features for reading, normalizing and pre-processing genomic data from a variety of technologies.
edgeR is an R package for analyzing sequence read count data from genomic sequencing technologies such as RNA-seq, ChIP-seq and ATAC-seq. Like limma, edgeR is particularly designed to detect genes or features that have changed abundance levels between experimental conditions or cell types. Whereas limma is designed to analysis continuous expression values, edgeR is designed to analyze abundance measured in terms of sequence read counts, which enables edgeR to be the underlying analysis engine for a wide range of sequencing technologies. Where limma uses linear models and normality, edgeR uses negative binomial generalized linear models. edgeR pioneered the use of the negative binomial distribution to model read counts in genomic research. The aim of edgeR is to provide the same range of functionality for counts that limma does for normally distributed data, but the modeling of counts is mathematically much more difficult. edgeR implements a range of novel statistical methods developed by the authors including a method to borrow information between genes, a strategy that is essential for genomic experiments with small sample sizes. Whereas limma is mostly written in R, edgeR is substantially written in C++ to increase speed and conserve memory use.