Future: Simple, Scalable Parallelization in R for the Biomedical Community
Henrik Bengtsson (University of California, San Francisco)
To sustain maintenance, improve community support, enhance usability and robustness, and add improvements for the future framework.
Parallelization is used in life sciences to speed and scale up computational and memory-intensive tasks. As data sizes grow rapidly, so does the demand for scalable parallel-processing solutions. This explosion is seen in areas such as genomics and bioinformatics, where the data volumes and the number of samples or patients studied get larger by the day. Historically, parallelization in R has been “scattered,” meaning developers of scientific pipelines have had to target specific technologies and operating systems, requiring major efforts to support additional ones or to scale up their pipelines. The future framework, released in 2015, targets this problem; pipelines using it scale out automatically to an existing or to-be-developed parallel backend of the end-user’s choice while guaranteeing correctness and 100 percent reproducibility. The future ecosystem is well maintained (~30 releases/year), but however, since its release, the uptake of future has grown rapidly, and with it, the support and maintenance burden.