23 March 2011

Having just watched "The Social Network", I stumbled on a post on Twitter pointing to graph.facebook.com, the free API you can use to scrape the shit out of FB (well, almost).

Turns out the API will work with IDs. Since FB started as a Harvard-only site, the first few hundred users were all Harvard alumni, obviously. So I started thinking about simple experiments like finding the most popular surnames, certain of having my class-based prejudices reinforced by loads of Winklevoss-style "aristonames". Turns out the most common names are actually Asian -- the elites of tomorrow, of course.

That's the issue, isn't it? Harvard is (supposedly) a top institution, churning out the "elites of tomorrow"; they won't all become Mark Zuckerberg, but they probably won't be homeless either.

So, as a joke, I wrote a script looking for Wikipedia pages dedicated to the first 1000 users of Facebook. Turns out there are a lot of very common names, which obviously result in false positives; unfortunately Wikipedia doesn't give you easily-parsed metadata (here's a new project for Jimbo Wales and friends), so I couldn't do things like discarding everyone born before 1970. With a bit of patience, I narrowed down the number to a rough 6%. Some of them are (or were) Facebook employees, of course, but there are also young poets, writers and comedians.

You would probably get better results by replacing Wikipedia with LinkedIn, which would include more successful businesspeople and professionals -- Harvard's bread and butter. Obviously you could also start digging across the entire FB userbase, beyond the first lucky Harvardites.

These web APIs are a great tool for smart researchers; you now have a lot of data to be correlated with a little bit of programming glue and very little time. The result might not be scientifically exact, but could still unearth surprising insights.