5 Reasons to switch to OpenRefine (and never look back)
I know this article is about OpenRefine (previously Google Refine), but in order to get there, I need to start with Google Sheets.
For years I’ve used Google Sheets to work with my data, mostly because I thought that, besides Microsoft Excel, it was my only option. Most of the time I built spreadsheets using Google Sheets, so it only made sense for me to use the same programme to edit, filter and compare my data.
That was until I learnt about OpenRefine. The OpenRefine website describes the application as “a powerful tool for working with messy data: cleaning it; transforming it from one format into another, and extending it with web services and external data.” And that is exactly what it does, leaping past some of the basic limitations of Excel and Google Sheets.
So, here are my top five reasons to move to OpenRefine, and never look back:
- Consolidating different spreadsheets. Your data isn’t always in the same spreadsheet. I was working on a project about grade 12 school results and had split the data into different spreadsheets, according to the different provinces the schools were in. I thought that if I needed to combine separate spreadsheets into one consolidated dataset I would probably have to do that in Google Sheets using the tedious copy and paste method which leaves room for plenty of errors. My dataset was over 50,000 rows long and I knew that might cause problems in Google Sheets. I was pleasantly surprised when I realised almost everything could be done in OpenRefine itself, as long as the column headings were identical. I downloaded the different datasets in the same format and saved them all into one zipped folder. Then I created a new project by importing the zipped folder into OpenRefine, where I found all my consolidated data in one place. I’m not sure if OpenRefine sets a limit on the number of spreadsheets you can add into a zipped folder, but I have zipped together and uploaded as many as nine spreadsheets at once without any issues.
- A data clean-up master. OpenRefine’s “cluster and edit” tool is one of my favourites! Click the arrow next to your heading of choice, select “Edit cells” from the drop-down menu and then select “Cluster and edit”. This tool groups certain rows that likely have the same name, but have minor differences in spelling, spacing or capitalisation. The application gives you a suggestion of what the spelling should look like. Then you are able to choose between the different options by clicking on the one you prefer, or you can change one cluster’s spelling completely by typing in your own option.


- Your mistakes aren’t set in stone. OpenRefine keeps track of every move you make, much like the Google Sheets “version history” tool. But, when you’re working with a big dataset in Google Sheets and realise you need to restore an earlier version of your document, it can take several minutes to load, and even then your system might still crash completely. In OpenRefine, each edit is tracked in the left-hand panel under the “Undo / Redo” section and you can easily switch to an older version of your work without the Google Sheets lag.

- Filtering is a breeze. Unlike in Excel and Google Sheets, you do not have to enable the filtering feature when using OpenRefine. The filtering option is automatically there when you create your project. OpenRefine also offers different methods of filtering your data. You can use the text facet option, which automatically groups all the similar values under a specific column and gives you the number of entries under each value, or the text filter, which allows you to filter your data for specific words, numbers or phrases that you can type in yourself. All of the filtering occurs comfortably on the left-hand side of the screen, where you can easily select and deselect values. This also makes it easy to filter for multiple columns or variables at once.


- OpenRefine works offline. If you usually use Google Sheets, then you’re familiar with the frustration that comes with a weak internet connection. If you are having internet problems, that means you’re having Google Sheets problems because the quality of your connection directly affects your ability to work. This is another reason why OpenRefine is a great option. When you download OpenRefine it works like a desktop application; although it opens up into a web browser, it allows you to work even when your internet connection is offline.
Excel and Google Sheets are among the most popular options for working with data, but there is no need to feel chained to these programmes when better options exist.
Download OpenRefine here.
If you are looking for a good breakdown of the basics, check out this tutorial from Open Data Literacy or watch OpenRefine’s own introductory video series.
Media Hack Collective is launching in-person training in 2022. Watch out for more on data journalism training, tips and stories by subscribing to our Outlier newsletter.