Back to All Open Science Grantees
Scalable Visual Data Analytics with Orange Data Mining Toolbox
Proposal Summary
To refactor Orange Data Mining toolbox to include the latest Python libraries for parallel, server-based data analysis, allowing it to scale to large biomedical datasets.
Project
Workflow-building tools like Orange Data Mining toolbox democratize data science by exposing an intuitive interface while hiding complex underlying mechanics. The tool owes its success to the Python ecosystem, particularly to NumPy; its array is the backbone of the Orange Table, the data structure used by Orange components in the graphical user interface. This proposal aims to refactor Orange’s ecosystem with Dask, Python’s scalable data analytics engine. Refactored Orange will retain the simplicity and its intuitive user interface but revamp the data infrastructure under the hood to democratize big data analytics.