That’s because data is all too constantly provided or published in those nefarious document formats, even as reporters are getting more and more familiar with using data themselves.
My latest and favorite tool is Tabula, an open-source app made by and for journalists.
While there are a ton of tools out there for getting data out of PDFs (and I’ve compiled a long list here), most of them simply convert an entire PDF into an Excel sheet. Tabula lets you select data tables like you’re taking a screenshot of them, then – click! – you export the data into a variety of formats you can just pop in to Excel.
I highlighted Tabula during my talk at Mozfest in London earlier this month, and explained how to do it in this tutorial, so swing over to there for instructions. (Hint: there are like three steps.)