Enhancing the Performance, Documentation, and Data Ecosystem for bedtools
Aaron Quinlan (University of Utah)
To enhance bedtools’ functionality, documentation, and access to data, which will empower and expand the user community.
Collectively, the bedtools utilities are a Swiss Army Knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
Go Get Data (GGD) is a scientific data management system that enables quick and reproducible access to data “recipes.” GGD data recipes contain information on how to extract and process scientific data, providing access to curated scientific datasets. This eliminates a common frustration of finding, downloading, and standardizing heterogeneous datasets. GGD leverages the conda package management system and the infrastructure of Bioconda to provide a fast and easy way to retrieve processed annotations and datasets. Using GGD allows any user to quickly find and install a desired dataset.