Back to EOSS Proposals List

Back to All Open Science Grantees

Enabling Differential Analyses of Genomic Data with limma, edgeR and Glimma


Projects limma, edgeR, Glimma
Funding Cycle 1

Proposal Summary

To improve ease of use and interoperability of these packages, make methodological responses to new data challenges, refresh the documentation and structure of these packages, and prepare training materials.


Project

limma

limma is an R package for analyzing gene expression data from modern genomic technologies such as microarrays or RNA sequencing. Its central aim is to detect genes that have changed expression levels between experimental conditions or cell types. It can also test for changes in the structure of each gene in the form of differential exon splicing. limma can analyze arbitrarily complex experiments with multiple experimental factors. A unique feature of limma is its ability to borrow information between genes, increasing the power and robustness with which conclusions can be reached for each individual gene. This feature allows it to conduct robust statistical analyses even when the number of biological samples is very small. All limma functions support the concept of precision weights. limma can identify and down weight lower quality samples and can estimate correlations between closely related samples from the same biological source. limma includes methods for interpreting differential expression results in terms of higher-order molecular pathways or biological processes. limma provides a range of plotting functions for exploring data quality or for presenting and interpreting results. At a lower level, limma includes rich features for reading, normalizing and pre-processing genomic data from a variety of technologies.


Key Personnel

Gordon Smyth
Charity Law
Yunshun Chen

Project

edgeR

edgeR is an R package for analyzing sequence read count data from genomic sequencing technologies such as RNA-seq, ChIP-seq and ATAC-seq. Like limma, edgeR is particularly designed to detect genes or features that have changed abundance levels between experimental conditions or cell types. Whereas limma is designed to analysis continuous expression values, edgeR is designed to analyze abundance measured in terms of sequence read counts, which enables edgeR to be the underlying analysis engine for a wide range of sequencing technologies. Where limma uses linear models and normality, edgeR uses negative binomial generalized linear models. edgeR pioneered the use of the negative binomial distribution to model read counts in genomic research. The aim of edgeR is to provide the same range of functionality for counts that limma does for normally distributed data, but the modeling of counts is mathematically much more difficult. edgeR implements a range of novel statistical methods developed by the authors including a method to borrow information between genes, a strategy that is essential for genomic experiments with small sample sizes. Whereas limma is mostly written in R, edgeR is substantially written in C++ to increase speed and conserve memory use.


Key Personnel

Gordon Smyth
Yunshun Chen
Charity Law
Göknur Giner

Project

Glimma

The Glimma package creates interactive plots of microarray and RNA-sequencing gene expression data, allowing biologists to conveniently explore results from differential expression analyses. Interactivity of plots allows per gene information to be displayed alongside experiment-wise summary displays. The interactive graphics are created with simple R commands, supporting analyses and object classes from limma and edgeR. The package uses D3/JavaScript to produce HTML pages.


Key Personnel

Gordon Smyth
Charity Law