Scalable Visual Data Analytics with Orange Data Mining Toolbox
Blaž Zupan (University of Ljubljana)
To refactor Orange Data Mining toolbox to include the latest Python libraries for parallel, server-based data analysis, allowing it to scale to large biomedical datasets.
Workflow-building tools like Orange Data Mining toolbox democratize data science by exposing an intuitive interface while hiding complex underlying mechanics. The tool owes its success to the Python ecosystem, particularly to NumPy; its array is the backbone of the Orange Table, the data structure used by Orange components in the graphical user interface. This proposal aims to refactor Orange’s ecosystem with Dask, Python’s scalable data analytics engine. Refactored Orange will retain the simplicity and its intuitive user interface but revamp the data infrastructure under the hood to democratize big data analytics.