NYU — Spring 2014 / Week 3

Finding Hidden Data, thinking like a robot.

For many data journalism projects, the Census, BLS, state governments and academics provide plenty of interesting data. But they aren’t the only places to find it, and sometimes it’s in a form you might not be used to. Today, we’ll go over some non-excel data sources and practice fetching data out of them.

Housekeeping

We should probably talk about your projects.

Discussion:

We talked a little bit last class about how interesting and fun data journalism is also more creative than you might think. Some more examples.

What’s the mindset behind these? How can you apply that thinking to your own work? Think about the thought process behind these – what do they have in common?

Or, another way, try to re-create the thinking behind how one of these gets made.

Get to know your Web inspector

The inspector in Chrome is an extremely useful tool for all sorts of functions; today we’ll mess around with it on some popular web sites.

We’ll test for Javascript libraries like D3 and Jquery and do some basic looking around in the Elements, Network and Console panels. We’ll play with CSS, write some Javascript; mostly, though, we’ll use it to find out where some data lives. In short, the inspector does much more than just let you experiment with CSS. It also lets you see every asset your web page is loading — even when you can’t see it.

Having a hunch data has an external source.

Most of the time, if you get the impression a human didn’t create the internet page you are reading from scratch (like this one), there’s a good chance it was generated in a structured way, which means you can usually get it out in a structured way. If there’s one lesson from today, it’s that almost without exception, if the data was generated in a structured way, you can usually get it out in a structured way.

It can be in many different kinds of structure – plain old text, JSON, XML, HTML tables, even PDFs. Find and download the data behind these.

Now install this Chrome extension and check out the 4th down data again.

Helpful tools.

A very basic scraper

There are tons of resources out there to automate data collection – this is just (a very cheap) one. Most of them require a little more programming than this one, but it’s almost always worth the effort. Here’s a handy tipsheet from Scott Klein and Michelle Minkoff.

Homework

Give these a read for next time