Maintenance & Extension of scikit-learn: Machine Learning in Python
Thomas Fan (Quansight)
To further the sustainability and usability of scikit-learn by reducing the maintenance backlog and extending its machine learning models and pipelines to support more complex datasets.
The open source machine learning project scikit-learn has become a foundation for applied machine learning and data science in academic and industrial research. As scientists move toward more data-driven research, data analysis and machine learning have become fundamental building blocks for many research disciplines. scikit-learn provides a consistent interface that abstracts away the algorithm, enabling users to focus on their particular problem. This proposal seeks to fund the project’s maintenance and extend its machine learning models and pipelines to support more complex datasets. scikit-learn has accumulated a backlog of over 1,600 issues and 750 pull requests. This backlog includes essential bug reports, bug fixes, performance regression reports, and feature contributions. The number of issues and pull requests exemplifies the community’s size and interest in contributing to the project. The goal of this work is to reduce the backlog by 40 percent.