Thursday, July 12, 2012

I've commented before (see here for an example) about how important classification is to data analysis: you have to put data into categories before you can count them. How you define the categories, and deciding which category is the best fit for ambiguous data is something you'll need to do (one researcher I know called this 'digging around in the data' her favorite part of the work).

Classification of information is also important among web pages, though of course on a much larger scale. You can read, here, a very interesting article by David Auerbach called "The Stupidity of Computers" (it's from the current issue of the magazine n+1). In the context of searching the web, all of human knowledge becomes hard to classify. But there are some shortcuts, as Google has demonstrated.

Auerbach argues that two of the best shortcuts are those used by Amazon, and Facebook. Amazon reaches shoppers by using categories that they already know: books, jewelry, housewares, and so on.

[Amazon] didn’t have to explain their categories to people or to computers, because both sides already agreed what the categories were. . . . They could tell customers which were the bestselling toasters, which toasters had which features, and which microwaves were bought by people who had bought your toaster.

We don't complain about Amazon and privacy; we are willing to give up information because of the great convenience of Internet shopping. Facebook, on the other hand, goes much further: it asks for information, and then categorizes it:

As it grew, Facebook continued to impose structure on information, but the kind of information it cared about changed. It cared less about where you went to school and a lot more about your tastes and interests—i.e., what you might be willing to buy. This culminated in a 2010 redesign in which Facebook hyperlinked all their users’ interests, so that each interest now led to a central page for that artist, writer, singer, or topic, ready to be colonized by that artist’s management, publisher, or label. “The Beatles,” “Beatles,” and “Abbey Road” all connected to the same fan page administered by EMI. Updates about new releases and tours could be pushed down to fans’ news feeds.

And, Auerbach says, there's more: Facebook wants to amass information about what its users do on other sites. Every time you log in somewhere using your Facebook ID, you are contributing data for analysis. It's something we can expect to see increasingly in the future, and what use some corporation is making of this data is worth thinking about with every login.