Aggregating
In our quest to master “sort, filter, aggregate, merge,” we forgot aggregate.
Housekeeping
Six of you have not signed up for a date on the critique wiki.
Should we talk about enough html to make your work look less terrible?
The NICAR conference is going on this week. Some materials. Some ideas. The mailing list
Solving your problems + aggregate warm-ups
Let’s start with a smaller dataset about the anatomical focus of yoga poses, as listed on yogajournal.com.
- What area of the body has the most poses listed?
- What pose shows up the most often?
- What are all the poses that focus on both the brain and the calves?
We also have a larger dataset about donations to a new political party in India. Feel the power course.
- The biggest donations?
- Amount received from people in each country?
- The typical donation size?
- Average donation size by country?
- Biggest days?
Where we left off.
- Don’t try to be too complicated at the beginning. Start with very easy questions - think averages, top 10 lists, outliers. After you have those answers, then you can add fanciness.
- We are looking for things that we don’t expect. Washington Monthly’s rankings are based on the same idea.
- Find two schools that with similar characteristics, but very different graduation rates.
Expanding our college graduation sample. If we were to try, how far would we have to go?
Here’s a list of schools for that. How many schools would we have to contact if we wanted to cover, say, half of the nation’s undergraduates? 75%? How else could we restrict our sample?
Write our letter.
Let’s do it together here.
Homework
You have two choices. You may fill out 20 rows with contact information in this document. Or you may embark on your own data collection project of a similar scale. If you choose the later, please push it to your github page.