Saturday, July 04, 2009

I love IMDB. For one, it might be one of the very few non-personalized recommendation systems that actually work. Scores on IMDB correlate very highly with likelihood of me enjoying a movie, especially if I apply a correction for genres that are overrated (like very old and very long movies) or underrated (like zombie flicks). In any case IMDB is vastly more accurate than professional critics' reviews.

The most drastic example of IMDB and pro critics disagreeing was Transformers: Revenge of the Fallen, which I absolutely loved! It got an okish 6.5 score from IMDB - which is accurate enough, it's a really nice action-packed movie vaguely following Transformers canon, with as much concern for the plot as a typical action movie (that is not terribly much), and perhaps far higher aircraft carrier to robot ratio than you'd expect from the title.

Anyway, back to my point. IMDB not only provides great recommendation service, they also make a lot of the underlying data available in convenient formats. Yes, they charge huge money for some of the data, but even the free portion is very useful.

Some voting patterns are very interesting. Here are two examples. First, one of the "so bad it's good" movies - Troll 2. Normal movies have sort-of Gaussian distribution of votes. OK, not really Gaussian but with a fairly definite single peak. Not so with the "so bad it's good" movies - these tend to have mostly 1s and 10s, with perhaps a few 2s and 3s, but amusingly very rarely many 9s and 8s - it's either very low, or a 10!

The second interesting bit is a demographically polarizing movie like Twilight. People of different ages and genders tend to like pretty much the same movies.

Top 100 for men, and Top 100 for women contain virtually the same movies, just slightly rearranged - neither stereotypically feminine chick flicks nor stereotypically masculine action flicks get into top lists, and movies rarely have very different male and female scores, or much different scores by age group. But exceptions do happen, here's one:

Under-18 men scored it almost 2 points lower than under-18 women! I have a perfectly good explanation for the Twilight ratings effect - normally movies are watched by people who like the genre, so only men who like chick flicks and only women who like action movies watch them. So there willbe very little discernible gender-genre bias. But with Twilight, millions of girlfriends worldwide must have forced their boyfriends to watch Twilight against their wills, to what the aforementioned boyfriends reacted in a passive-aggressive way by downvoting Twilight on IMDB. I don't have any way oftesting this theory, but I watched Twilight on my own free will, and liked it a lot.

Anyway, back to "so bad it's good" movies. I really like the genre, but it's difficult to tell the "so bad it's good" movies from straightforwardly bad movies. So like a good hacker I am, I decided to grab IMDB's database, and find out movies. Here's the list, criteria being - at least 10% of votes are 1s, at least 10% of votes are 10s, ordered by number of votes.

The most controversial movie in every way is Jonas Brothers: The 3D Concert Experience with 35.3% 10s, 61.5% 1s, and only 3.2% everything else. It also has huge 5.0 to 1.3 under-18 gender gap, but as you can see even most teenage females aren't huge fans of it. It's also the worst scored movie on the list, with score of just 1.3, what suggests IMDB's filters think most of the 10s are attempt at ballot stuffing, and get thrown away.

The highest rated controversial movie is Le salaire de la peur with IMDB score of 8.3 and somehow still more 1s (10.4%) than Transformers 2.

The highest rated controversial movies that seem popular (by number of votes they received) are Fahrenheit 9/11, and Twilight, both of which I liked, and not because of the "so bad it's good effect", and The Blair Witch Project, which I hated with passion for being so unbelievably boring.

The lowest scored movie from the list I watched was Troll 2 and I liked it because of the "so bad it's good" effect.

8 comments:

Have you tried Criticker? It totally works for me. Took it some time (like two hundred movies rated) to calibrate to my taste, but now it's rarely more off by more than 5 percent points when calculating PSIs (probable score indicators). Oh, and it's got IMDb ratings imported there as a virtual user, so you might find out just how much you really agree with IMDb.

Actually it analyses your scores and spans deciles along your scale, therefore assigning them to 10 tiers. That way it works a charm however you rate your movies, eg. you can use /10, /100 or whatever scale, you can consistently overrate movies, etc. Takes some getting used to, but makes sense if you think about it.

Of course it uses the normalized scale when calculating likeness &c. (converting back when presenting you with a PSI). Wouldn't make much sense otherwise, given peculiarities of people's rating habits, now would it?

My software

Creative Commons

Unless otherwise expressly stated, all original material of whatever nature created by Tomasz Węgrzanowski and included in this blog, is licensed under a Creative Commons License. It is also licensed under GFDL (for Wikipedia compatibility).