Back to All Open Science Grantees
Scaling OpenRefine
Proposal Summary
To attract new contributors by improving OpenRefine's documentation, and implement a new data model to improve the scalability, transparency, and reproducibility of OpenRefine workflows.
Project
OpenRefine is a power tool to clean up messy data. Requiring no knowledge of a programming or query language, it lets users find and fix inconsistencies interactively, match their data to external databases, pull additional data from these, and many other useful operations. The resulting workflows can be extracted and applied on other projects, making them reusable and reproducible. OpenRefine was originally designed as an Export-Transform-Load tool to populate Freebase, under the name “Freebase Gridworks.” It was then briefly a Google product which became an open source project when Freebase was discontinued. Thanks to a grant from the Google News initiative in 2018, integration with Wikidata was developed, making it a tool of choice to import data into Freebase’s successor.