NYU — Fall 2014 / Week 3

Finding Hidden Data, thinking like a robot.

For many data journalism projects, the Census, BLS, state governments and academics provide plenty of interesting data. But they aren’t the only places to find it, and sometimes it’s in a form you might not be used to. Today, we’ll go over some non-excel data sources and practice fetching data out of them.

Housekeeping

• Both your teachers are now present.

• We discuss your homework.

• We should probably talk about your projects.

Discussion:

Interesting and fun “data” journalism (also called journalism) is also more creative than you might think. Some more examples.

What’s the mindset behind these? How can you apply that thinking to your own work? Think about the thought process behind these – what do they have in common?

Or, another way, try to re-create the thinking behind how one of these gets made.

Get to know your Web inspector

The inspector in Chrome is an extremely useful tool for all sorts of functions; today we’ll mess around with it on some popular web sites.

We’ll test for Javascript libraries like D3 and Jquery and do some basic looking around in the Elements, Network and Console panels. We’ll play with CSS, write some Javascript; mostly, though, we’ll use it to find out where some data lives. In short, the inspector does much more than just let you experiment with CSS. It also lets you see every asset your web page is loading — even when you can’t see it.

Let’s go to some web sites using Google Chrome and look around with the inspector.

Having a hunch data has an external source.

Most of the time, if you get the impression a human didn’t create the internet page you are reading from scratch (like this one), there’s a good chance it was generated in a structured way, which means you can usually get it out in a structured way. If there’s one lesson from today, it’s that almost without exception, if the data was generated in a structured way, you can usually get it out in a structured way.

It can be in many different kinds of structure – plain old text, JSON, XML, HTML tables, even PDFs. Find and download the data behind these.

Now install this Chrome extension and check out the 4th down data again.

A very basic scraper

We’ll scrape like adults eventually, but despite what Amanda thinks, this exercise is a good introduction.

There are tons of resources out there to automate data collection – this is just (a very cheap) one. Most of them require a little more programming than this one, but it’s almost always worth the effort. Here’s a handy tipsheet from Scott Klein and Michelle Minkoff.

Homework

Give these a read for next time