What YouTube told us about the popularity of 2017’s Best Picture Nominees : An interview with Polygraph

On Sunday night, millions of Americans will tune into the 89th Academy Awards to celebrate the most critically acclaimed films of the year.

Over the last few weeks, we’ve been curious about whether YouTube data could tell us something about the differences in how Americans watched this year’s Best Picture nominees. Were there “hotspots” or certain parts of the country where La La Land was more popular? Were Americans in the Midwest more interested in Hacksaw Ridge than Americans in the South?

A heat map for each Oscar nominee’s popularity, based on YouTube trailer views. See more at: googletrends.github.io/google_oscars

We spoke with the team at Polygraph to learn more about their process in building this visualization and some of the data and design challenges they tackled along the way.

“It was easy to get lost in all the possibilities.”

On the data

What would you recommend to others working with YouTube data? Were there any unique challenges you faced with visualizing YouTube data?

The data gathering, in this case, was much simpler than many other projects, since we had access to all the granularity and detail we could ask for.

The downside of this was that it was easy to get lost in all the possibilities. Looking at past cartographic projects and assessing the level of detail they’ve used helped us a good deal by giving us a springboard for exploration, and allowed us to narrow down the number of worthwhile directions.

What was something that surprised you as you first started visualizing the YouTube data?

Not seeing results was a bit of a surprise!

The time variable played a key difference between us seeing interesting results and a painfully bland dataset, so it took several attempts to figure out the appropriate window of time for each movie’s analysis. This was particularly important because we needed a standardized number-crunching approach for all the nominees which would allow us to compare films that had hugely different popularity nationally, and were released throughout different parts of the year.

Did the data surface any patterns or hypotheses that you’d encourage other journalists or storytellers to further explore?

As the NY Times showed with their TV maps and the Oscars project reaffirmed, regional tastes in culture exist, despite being initially difficult to spot. These are particularly interesting when it comes to the urban/rural split, but also manifest in unexpected ways (e.g., Fences being disproportionately popular in Kansas) — we’d love to see greater explorations of these disparities.

“We really wanted readers to pick up on the hotspots that had a particular connection to the film.”

On the design

What was your design process in building the visualization?

Experimenting with varying levels of geographic smoothing. Upper: Views for Arrival’s trailer, with less smoothing, rendered in QGIS. Lower: Views for different nominees, with a higher degree of smoothing, rendered in D3.

After we processed the data, the first step was to do some preliminary mapping to get a sense of what we were working with. We decided to implement a regional smoothing algorithm to make the geographic trends more pronounced. We then brought the data into the browser, and started playing with color palettes, interaction, etc.

Seeing all the maps together allowed us to settle on the narrative of profiling the best-picture nominees with some additional research and annotations to explain some of the geographic trends. This guided the design hierarchy and layout of the story. Once we had the basic structure, it was just a lot of small design/code/feedback loops until we ended on something we were happy with.

What was a rule that you consciously followed in the design? A rule that you consciously broke?

Performance and a mobile-friendly experience were some guiding design principles. Although we toyed with making the maps interactive and zoomable, we determined that revealing the national trends was more compelling and tried to make the experience mimic that. We generated the maps using JavaScript, but then ended up baking them out to static image files. This way we didn’t need to load tons of data or do all the rendering client-side.

A rule we consciously broke was with our color scale. Technically speaking, we should have used a proper diverging color scheme, with equal parts below and above “normal.” We instead decided to bucket all of the under-indexing values into a single bin, and use a few breaks for the over-indexing values. This allowed the map to portray what we wanted — where the movie was most popular — without distracting with the other data points.

What’s one thing you intentionally wanted a reader to come away with? How did you design the visualization to enhance that?

We really wanted readers to pick up on the hotspots that had a particular connection to the film. We decided to experiment here with “connected annotations.” Instead of a traditional annotated map with an arrow, we decided to visually connect the prose and the map. We relied on a two-way hover event that triggered a visual change in both the specific section of prose and the hotspot on the map. This way we could direct the reader’s attention to the prose when looking at the map, and to the map when reading the prose.

A big thanks to the folks at Polygraph as we continue our series of visual experiments, alongside Alberto Cairo as consultant art director. Our last project was a look at language through Google Trends. Keep an eye out for our next project!