I just saw this NSFW (due to language) data visualization & analysis by Reuben Fischer-Baum of when singers of the United States’ national anthem first flub the lyrics based on 26 youtube videos. I slightly altered their graphic to make it safe for work, but I think you will get the idea.

Apparently, the lyrics which posed the greatest challenge for the singers in the videos are “were so gallantly streaming”. This is a fun idea for analysis, and I enjoyed reading about it, but I think it could easily be made even more interesting and informative.

Here are my suggestions:

It is not totally clear how Fischer-Baum collected his sample. I would suggest taking some kind of random sample of the youtube videos. In addition to making this more generalizable, taking a random sample would allow us to answer other questions. For example, what is the probability of getting through the anthem unscathed given that you got past the “were so gallantly streaming” death trap? We don’t get to know that since Baum restricted his sample to flubbers.

I imagine that there could be variables that are associated with flubbing that could be included in the analysis. For example, it would be interesting to test whether trained singers are more likely to get through more of the anthem before flubbing than non-singers (like Roseanne Barr).

I really think this should be a survival analysis of time-to-event (or possibly words-to-flub) data. That way you could formally test my singer vs. non-singer hypothesis. After all, it worked for analyzing data from Ru Paul’s Drag Race.