Stat Trek

It's a well known fact that I'm a big Star Trek fan, having spent way too many hours in childhood (and college) watching every episode and movie. Between Trek and my enjoyment of statistics/visualization, I guess you could call me a Data Nerd... oh man, I kill me.

Thus, it's only natural I should combine these passions, and bring to you STAT TREK

The Curse

The "Odd Number Curse" for Star Trek movies.

It has oft been noted that odd-numbered Trek movies do poorly, and that the even-numbered ones are the best. Generally I'm inclined to agree, as even-numbered films include Wrath of Khan, Voyage Home, and First Contact. The observant among you will notice the hugely popular release of Star Trek (2009) is an odd-numbered film, and has ostensibly "broken" the curse. Instead, I suggest that Nemesis (2002) was so bad, it broke the even-number charm.

Movie Stats

Budget, Runtime, and Words per Minute over time.

Here are various quantities as a function of time. I thought the relatively massive budget was interesting for the 2009 film, as it's fairly typical of large production Blockbusters over the past 10 years. Also note the very low Words/Min for the first movie (1979)... they spent so many minutes slowly panning around the ship that it added a half hour to the damn movie. JJ has opted for a more wordy film.

Readability

Reading Grade Level over time.

Simply recreating the "Curse" and some stats plots would not be enough for a blog post, so I had to dig deeper to fully realize my nerdy Stat Trek dreams. I found transcripts for all 11 movies on the magical internet, and knew I had the data necessary for some fun analysis. I put each transcript through a readability analyzer, which uses formulas to determine things like reading grade level and reading ease. These are based on numbers of syllables per word, words per sentence, etc. Above I have plotted two similar equations for computing the reading grade level of each movie.

Of course, movie transcripts aren't designed to be the same as written words. I did find that there has been some scholarly work on the subject of subtitles, however. My inner 13 year old is very glad that all the movies seem to be within my understanding range. Harry Potter scores somewhere around a Grade 5, for comparison.

Rating versus number of syllables per word for 11 Star Trek movies.

Reading grade level and "Reading Ease" are both related to the average sentence duration and "word duration" (a term I just made up for #syllables/#Words ). Here I've compared the Rotten Tomatoes ratings against the average # syllables per word. There does appear to be a sweet spot around 1.53.

This has just been an excuse to make some simple charts with dorky data, but I think the readability score might be an interesting factory to study for a larger number of transcripts. With that in mind I'm already working on the next installment of the Stat Trek posts! Stay tuned...