I’ll admit I a spend considerable amount of time on YouTube. I consume mountains of videos ranging from eurobeat mixes, to dash cam montages, to tensorflow tutorials, to political video essays. However, some of my favorite content are the clips posted by late night television shows. Colbert, Meyers, Noah, and Oliver always put a funny spin on modern politics and present some compelling interviews.

While watching these hosts, I started to notice O’Brien’s videos appearing in the recommended videos. At first, I dismissed them as normal considering I was watching a considerable amount of late night shows. However, I began to notice a difference in the way O’Brien (or most likely his team) titles the interview videos. When the guest is man, the titles seem normal, usually something to do with a book they’re releasing or a movie their in. But when the host is woman, the titles are noticeably more sexual and provocative.

Instead of just ignoring it, I thought I would quantify my observations with some visualizations.

Next, I use the pafy library to extract the metadata from the playlist of all O’Brien’s videos. I believe the pafy library is essentially a wrapper for the youtube-dl command line tool. The library’s usage doesn’t quite match the documentation but it’s good enough for what we’re doing.

Unfortunately, O’Brien doesn’t have a playlist of all the interviews so we’ll have to figure out which videos are interviews using the metadata.

Next, in order to determine if the video is an interview or not, we will use a list of names collected from the census. The process will of course not be entirely accurate because of annomalies in the titles. For example, not all interviews start with the name of the guest and “The” is technically a name.

Take note that in the original stackoverflow post, one of the files is in table format and not csv. I was having trouble importing that with pandas so I simply used Google Sheets to convert it to csv.

In addition to using the name check, we will also ignore videos with a forward slash “/” in the title. Those videos are almost always band performances.

To determine the gender of the name, we can use the gender guesser library. In our case, androgynous names will be ignored.

Finally, we can remove common phrases from the titles like “ - CONAN on TBS” which is at the end of every one of his titles.

Conclusion

I think the wordclouds speak for themselves. While “Sex” and “Sexy” are among the top words for women, they barely come up for men. Perhaps the biggest irony is that “Man” is a top word for women. One of the more off-putting insights is that “Naked”, “Boobs”, and “Butt” are all common enough to appear in the wordcloud for women, but I cannot find a single body part word in the wordcloud for men.

This idea started a just a hunch but after quantifying it, it’s evident that O’Brien’s team purposely sexualizes the video titles for interviews with women guests. Although this is not a new practice on YouTube, a mainstream late night host’s channel should not be intentionally sexualizing their content that features a woman guest.

Furthermore, this is not a common practice. Here are wordclouds I generated using the same method for Colbert and Fallon.