Who's Behind IMDb's Ratings?

IMDb (Internet Movie Database) is one of the most frequented movie rating sites, and for many people, these rating dictate whether or not a movie is even worth watching. The rating system isn't based on hand selected critics, it revolves around its users (which is open to anyone). These anonymous users hold great power in what movies are deemed must watch, or don't even bother. So the question is, WHO are the individuals behind these reviews? With this question in mind, I decided to scrape IMDb's demographic rating breakdown to look a little more closely into who's behind the scene. The answer is below.

For more information on what I used for scraping and analysis, please visit my GitHub Repo.

Data Scraping

My scraping method of choice for this project was Scrapy. Below is the criteria I used to populate the list of movies I worked with as well as a sample of the Rating By Demographic which can be found with each movie.

Analysis

The first (and most obvious) thing I looked into was the breakdown of votes based on gender. It is extremely clear that the majority of IMDb's users are male. Which was an extremely surprising discovery for me.

Next I decided to look into how each demographic votes. I was able to break gender down even further by including the age range of the user as well. As you can see, the younger the user, the more likely they were to give a movie a higher rating. The users who are more likely to give a lower rating are 45+ year old men. Please keep in mind that there were far fewer ratings by users under 18, so this could be a major factor in why they have an average higher rating. With more votes, the mean would likely sink a little.

To further demonstrate how user gender/age differs, below is a heatmap analyzing how each user votes based on genre.

I also looked into how users have been voting based on the year. As we can see below, the divide between male and female users has not decreased even in recent years. We can also see that the number of ratings has diminished in the past few years. This may be due to the fact that it takes a few years for everyone to see each movie from one year, or perhaps IMDb eliminated bots that may have been responsible for inflating movie ratings.

Lastly, below is a graph analyzing the difference in rating based on who the Lead Actor is. These were randomly selected.

As we can see, on average, female users tend to rate higher than their male counterparts. However, there are certain lead actors (and movies) where there is an exception.

With the information that I gathered (much of which I didn't post here), I would like to look further into the actual reviews left by each user and use NLP to analyze the difference. That's just one path. Another one that interests me is analyzing what movies perform best based on lead actor, genre, and director. It would be extremely valuable to predict a movie's potential ROI based on what I was able to scrape.

Conclusion

IMDb is clearly dominated with male users, and it would be ideal if they're able to somehow interest more female users to make the site less male biased. All in all I greatly enjoyed working on this project as I'm a huge movie buff, and this was a nice peek inside one of the review sites I frequent most often. Thanks for reading!

About Author

Stephen Shafer

BS in Accounting with a concentration in Management Information Systems (MIS) at Binghamton Universtiy. Previous FinTech sales experience has allowed me to more clearly understand where true value lies in data, and how it can be directly translated...