Scaling OpenRefine

Project OpenRefine
Funding Cycle 1

Proposal Summary

To attract new contributors by improving OpenRefine's documentation, and implement a new data model to improve the scalability, transparency, and reproducibility of OpenRefine workflows.



OpenRefine is a power tool to clean up messy data. Requiring no knowledge of a programming or query language, it lets users find and fix inconsistencies interactively, match their data to external databases, pull additional data from these, and many other useful operations. The resulting workflows can be extracted and applied on other projects, making them reusable and reproducible. OpenRefine was originally designed as an Export-Transform-Load tool to populate Freebase, under the name “Freebase Gridworks.” It was then briefly a Google product which became an open source project when Freebase was discontinued. Thanks to a grant from the Google News initiative in 2018, integration with Wikidata was developed, making it a tool of choice to import data into Freebase’s successor.

Key Personnel

Antonin Delpeuch
Owen Stephens