Tag Archives: predicting acadmeny awards

Beauty is in the eye of the beholder. But what determines beauty is behind the eye of the beholder.

We know that if two people watch the same debate, they nearly always think the candidate who is closest to their opinion won. That’s why debates seldom move minds. The same is surely true for movies. How often does a scene or character in a movie resonate with something that’s going on in your life? You’ve probably had that happen and when it does, it makes the movie more memorable and impactful. Given a roughly similar level of artistry and accomplishment, the movie that Hollywood insiders will likely prefer is the one that feels closest to their heart. But how to measure that? Our goal was to build a method for understanding a community’s culture by understanding the whole of what they read and then to develop methods for matching specific pieces of content to culture. Our hypothesis is that, other things being roughly equal, the movie that is closest to the Hollywood worldview will win.

Who is this “we” Kemosabe?

I lead the digital analytics practice at Ernest & Young (EY). As part of that, I also lead a “Counseling Family” of west coast analysts (I’m based in SF). A counseling family isn’t so much a corporate reporting structure as a community of interest and support group. We try to do fun stuff on the side, support each other, and build careers. Since my CF is all analysts (but not all digital), our idea of fun isn’t limited to surfing, skiing and room escapes (though it does include those). We try to sprinkle in some fun analytics projects we can do on the side – things that give us a chance to pursue special interests, work together, and have some deeply geeky fun. So when 538 – a site we all love – announced their Academy Awards prediction challenge, I signed us up. We have a much larger team and a much larger family than you’ll see here, but not everybody always has time for the fun stuff. Clients come first. For all of us, this is a side-project we squeezed in-between the billable hours. So special thanks to all the members of the team who contributed (mightily) to this effort. This is far more their effort than mine and I’ve tried to call out the team members who worked on each step of the analysis. And what did I contribute? Well, you know that Fedex commercial where the senior guy kind of adds the arm chop? Seriously, the broad analytics approach was mine but all the real work came for the teams I’ve named below.

Why we think this is interesting

It’s unlikely that matching content to community culture will out-perform other prediction methods that are focused on things like earlier voting results. However, such methods are of interest only with respect to the problem of predicting this specific award. And how much do we really care about that? I’ll just say this, if you’re betting on the Oscars, “culture matching” probably isn’t the best bet to punch your winning ticket.

Our goal was to develop an approach that might be interesting and applicable to a broad range of problems and that would require interesting analytic methods (R idea of fun). Wouldn’t it be nice to be able to map a TV drama to an audience’s culture? To understand which social media content would most appeal to a targeted community? To know which arguments will play best in Iowa vs. New Hampshire? These, and hundreds of other applications, involve matching content to a community culture. So let’s dispel with the myth that this is just about predicting the Oscars. There are many, many problems where having a “culture matching score of content to community” might significantly improve analytic models and the Oscars is just one (interesting) case of that broad problem set.

Methodology – High Level

To make our culture matching method work, we needed three basic components: a way to describe the Hollywood worldview and capture whatever zeitgeist was current, a way to describe the key themes in a movie, and a way to match and score the two sets of themes. Here’s how we went about developing these three components.

Within this broad method, we tried several different sub-approaches and several different technology solutions. Below is a more detailed break-out of each step.

Steps 1 & 2: Identify a Hollywood Corpus and Extract

One of the challenges to predicting Academy Awards is uncertainty around the exact community of voters. And, of course, even if you know the community you don’t necessarily know what (or if) they read. We looked at a number of different potential sources in developing a Hollywood corpus. We considered industry specific sources like Variety and American Cinematographer, general purpose sources like the LA or NY Times, and broader sources like Vanity Fair and the Atlantic Monthly. With more time, we might have been able to find ways to analytically identify which corpus or combination was most reflective. For this exercise, however, we simply pulled each data source, categorized them, and reviewed them. The review included study of word/phrase frequency counts and analyst’s reading the source material posts. We eliminated the industry specific sources because the text wasn’t thematically interesting enough. Though filled with Hollywood specific materials, most of that material was technical in nature (jobs, films in process, etc.) and too thin to establish broader cultural themes. The LA Times proved more accessible for large amounts of content than the NY Times and gave us a more focused geography. Vanity Fair turned out to be our favorite corpus. It blended lots of opinion and culture with a healthy serving of Hollywood specific content. For our analysis, we ended up using selected VF and LA Times categories with Vanity Fair dominating. For both these sources, we extracted 12 months of articles using a standard listening tool, filtered them by category and to eliminate duplications, and then loaded them into our analysis tools.

Data Extraction Team: Jesse Gross, Abhay Khera

Steps 3 & 4: Identify a Movie Corpus and Extract

Our initial thought was that we could use movie reviews to create a corpus specific to each movie. A good movie review will not only capture topic themes, but is likely to capture more abstract themes and also to tie those to broader cultural issues (like race, fear, or wealth inequality). We expected to be able to use sites like IMBD, Metacritic or Rotten Tomatoes to quickly identify and pull reviews. We were right – and wrong – about this. Movie reviews did turn out to be a really rich, highly-focused source of language about each movie. And the sites above gave us a great list of movie reviews to pull from. But we couldn’t pull full-text reviews from the APIs on those sites. Instead, we pulled the URLs of the reviews from those sites, filtered them for English-language only, and then wrote a Java program using Boilerpipe’s text extraction library to actually extract the review from its original site. Boilerpipe did a really nice job extracting core document text and with our script and the URL’s, we were able to quickly pull a large library of movie reviews for each nominated movie. This turned out to be more work than we expected but we ended up pretty satisfied with our Movie corpus.

At this point, we had two alternative approaches to matching the “Movie” corpus to the “Hollywood” corpus. The first method was to use IBM’s SPSS Text Analytics to extract and match themes. The second approach was to use a machine-learning tool to auto-match the two corpora.

Text & Linguistic Analysis Method

Step 5: Extracting Top Themes from each Movie

We started with a set of about 150 movie reviews per movie (all Best Picture nominees and those featuring a Best Actor or Actress nominee), and used R and SPSS to do an analysis of which word themes frequently occurred in that set. For example, some of 45 Year’s themes included “marriage”, “secrets”, “aging”, “jealousy”. We gathered about 20 themes for each movie and each actor. Second, we used SPSS to count the frequency that these themes occurred in our 2015 Hollywood corpus. The total number of occurrences gave us an initial score for each movie or actor. Next, we adjusted the initial score by examining context. We looked at a theme’s context in movie reviews. For example, in 45 Years, the husband receives a letter with important news. Therefore, a letter, in this context, is a personal communication sent from one person to another. In our Hollywood corpus, there were frequent occurrences of “letters to the editor”. That’s clearly a textual distortion not a cultural theme. We tried to make sure that thematic concepts were truly matches. When we judged the match to be spurious, we adjusted the score by removing the match.

We did try some alternative approaches. For example, we also asked ourselves whether the process worked in reverse. If we took key themes from the Hollywood corpus and then matched them to each movie, would be get similar results? If you think about it, you’ll see that this is a rather different question. There’s no guarantee that the top overall themes in Hollywood will match the top themes from ANY of our movies – so it’s possible that the answer to which movies match Hollywood themes isn’t the same as the answer which movie themes resonated most strongly in Hollywood. Our lead analyst on the SPSS text analytics, Brian Kaemingk, described these questions this way:

In the end, the models for each question produced quite similar results but there were a couple of movies (e.g. Bridge of Spies) that moved position significantly between the two methods. We decided that Question #1 worked better for our analysis, since the theme identification in the Movie corpus was richer and more specific than the them identification in the Hollywood corpus. We think those more specific themes are probably better in terms of capturing real aspects of the Hollywood worldview and creating that feeling of resonance we’re hoping to capture.

We also used this method to make our predictions around best actor and actress. Instead of using the whole review corpus, however, we first extracted concept maps around the character/actor. For Matt Damon in the Martian, that looked something like this:

We then matched these Concept Maps back to the Hollywood corpus. In our first try, we simply matched to the entire Hollywood corpus. However, we decided this confused concepts since optimism about the weather isn’t quite the same as being an optimistic person. So we decided to extract just people-themed concepts from the Hollywood corpus and then match those. The idea is that, just as we are matching the movie to broader cultural themes, we matched the character to the way Hollywood talks and reads about real people. Does Hollywood resonate to optimistic, imaginative scientists?

Well, at least Matt Damon’s handsome…

On the technical side of things, we used R to pre-process data and count theme frequency. R also helped to remove stop and non-thematic words and apply document stemming to make sure that themes were counted correctly. Stemming significantly boosts the accuracy of matching and theme consolidation. Most of our work, however, was done using IBM SPSS. We used SPSS to score themes and examine context using co-occurrence, semantic network, concept root derivation, concept inclusion, and text link analysis NLP techniques.

We are experimenting with different methods of using our machine learning tools. But our first attempt is very much a brute force method. We loaded the Movie and the Hollywood corpus into a workset. We then created training categories for each movie and trained the tool using the movie reviews for that film. After the training, we simply let the tool categorize every article in the Hollywood corpus and counted which movie it was categorized as most resembling. The category in which the most Hollywood posts were sorted was the winner.

This approach is asking a lot of the machine learning tool, but it was simple and potentially interesting. The hard part was trying to figure out if the resulting categorization made sense! That’s often the difficulty when working with a Black Box tool. Even if you believe the results, it can be hard to make skeptics into converts with black-box systems. It was particularly challenging in this case because we weren’t at all confident that this brute force method would produce good results AND we really had no outside view of a plausible rank ordering of movies. Even if the assignment of posts to movies was completely random, it would be hard to tell if it was wrong.

Isn’t it awful when you get all the way through the hour of something like Dancing with the Stars and then the actual selection is carried over into the next episode? Totally sucks!

Unfortunately, it’s a 538 challenge and we owe them first shot at the actual prediction. I’ll push it as soon as we post there. The good news? You can see it there Tuesday and I’ll even update this post to include the prediction.

UPDATE

We’ve release the predictions. Here’s the initial rank ordering of Best Picture nominees by match to Hollywood themes:

People have struggled with this (big) data provider model but Factual feels like it’s found a real (and valuable) niche. Would love to see more of this grow since external data is a huge miss in most big data systems.

Targeted VoC is a powerful (and totally neglected) tool for personalization. Facebook’s experience is entirely relevant to ANY content producer. I don’t know if I can take credit for this, but I suggested this to folks at Facebook a couple of years back!

An interesting discussion of the problems in identifying “likely” voters and the benefits of behavioral data integration. Food for thought in the enterprise world as well where the equivalent is often possible but rarely done.