This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize

By Jordan Ellenberg
02.25.08

Many of the contestants begin, like Cinematch does, with something called the k-nearest-neighbor algorithm — or, as the pros call it, kNN. This is what Amazon.com uses to tell you that "customers who purchased Y also purchased Z." Suppose Netflix wants to know what you'll think of Not Another Teen Movie. It compiles a list of movies that are "neighbors" — films that received a high score from users who also liked Not Another Teen Movie and films that received a low score from people who didn't care for that Jaime Pressly yuk-fest. It then predicts your rating based on how you've rated those neighbors. The approach has the advantage of being quite intuitive: If you gave Scream five stars, you'll probably enjoy Not Another Teen Movie.

BellKor uses kNN, but it also employs more abstruse algorithms that identify dimensions along which movies, and movie watchers, vary. One such scale would be "highbrow" to "lowbrow"; you can rank movies this way, and users too, distinguishing between those who reach for Children of Men and those who prefer Children of the Corn.

Of course, this system breaks down when applied to people who like both of those movies. You can address this problem by adding more dimensions — rating movies on a "chick flick" to "jock movie" scale or a "horror" to "romantic comedy" scale. You might imagine that if you kept track of enough of these coordinates, you could use them to profile users' likes and dislikes pretty well. The problem is, how do you know the attributes you've selected are the right ones? Maybe you're analyzing a lot of data that's not really helping you make good predictions, and maybe there are variables that do drive people's ratings that you've completely missed.

BellKor (along with lots of other teams) deals with this problem by means of a tool called singular value decomposition, or SVD, that determines the best dimensions along which to rate movies. These dimensions aren't human-generated scales like "highbrow" versus "lowbrow"; typically they're baroque mathematical combinations of many ratings that can't be described in words, only in pages-long lists of numbers. At the end, SVD often finds relationships between movies that no film critic could ever have thought of but that do help predict future ratings.

Singular value decomposition is one example of a family of techniques in data mining known as "dimension reduction." A classic example of dimension reduction is the work of Frederick Mosteller and David Wallace on the Federalist Papers. They showed that frequencies of certain words distinguished those papers written by James Madison from those by Alexander Hamilton. Madison used "upon" and "while" much more frequently than Hamilton, while for "although" and "whilst" the situation was reversed. So for each paper of disputed authorship, one can write down four numbers, corresponding to the frequencies of "upon,""while," "although," and "whilst." If the former two numbers are large and the latter two are small, you can confidently ascribe the paper to Madison. In this way, Mosteller and Wallace settled an argument that historians had been feuding about since the 19th century, with no firm conclusion in sight.

The danger is that it's all too easy to find apparent patterns in what's really random noise. If you use these mathematical hallucinations to predict ratings, you fail. Avoiding that disaster — called overfitting — is a bit of an art; and being very good at it separates masters like BellKor from the rest of the field.

In other words: The computer scientists and statisticians at the top of the leaderboard have developed elaborate and carefully tuned algorithms for representing movie watchers by lists of numbers, from which their tastes in movies can be estimated by a formula. Which is fine, in Gavin Potter's view — except people aren't lists of numbers and don't watch movies as if they were.

Potter likes to use what psychologists know about human behavior. "The fact that these ratings were made by humans seems to me to be an important piece of information that should be and needs to be used," he says. Potter has great respect for the technical prowess of BellKor — he is, after all, still behind the team in the rankings — but he thinks the computer science community studying this problem suffers from a bad case of groupthink. He refers to the psychological model underlying their mathematical approach as "crude." His tone suggests that if I weren't taping, he might use a stronger word.

It's easy to say you should take human factors into account — but how, exactly? How can you use psychology to study people about whom you know nothing except what movies they like?