Who's got the largest vocabulary in hip hop? A data-scientist crunched the numbers to find out. [UPDATED]

Data nerds and rap fans alike will enjoy this fantastic post from designer-cum-data-scientist Matt Daniels [UPDATE: Daniels has stopped by in the comments (username: mfdaniels) to field your questions], in which he examines the lyrics of 85 hip hop artists and ranks them, in an interactive visualization, according to the size of their vocabulary. He writes:

Literary elites love to rep Shakespeare's vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.

I decided to compare this data point against the most famous artists in hip hop. I used each artist's first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake.

35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don't have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

There're loads of information to be gleaned from Daniels' analysis. The number of artists who rank higher than Shakespeare (15) and Melville (3) is fascinating in its own right, but even more impressive is the ascendancy of Wu-Tang Clan, not just collectively, but as individual artists. In fact, the only lyricist to rank higher than Wu-Tang's GZA was Aesop Rock – an artist Daniels had almost excluded from his analysis, for fear that he was too obscure.

UPDATE: At Daniels' request, we've removed the ranked list of artists that we transposed from his visualization. He tells us he excluded it from his original analysis intentionally, because he

...wanted the reader to become entranced in the chart. A table of numbers dehumanizes the experience and makes it about comparing -20 DMX vs. +10 wu tang. Plus, it's really the relative positioning of the rappers that matter, not the raw number (due to the endless data analysis biases).

Daniels' point is well taken, and so we've removed the ranked list. But his objection also hints at some issues that we thought were missing from his original post. The most obvious one is that vocabulary is clearly an inadequate metric for success, profundity or impact when it comes to an artist's work. This is a fact that artists, themselves, are of course keenly aware of. Jay-Z seems an appropriate person to quote on the subject; on The Black Album track "Moment of Clarity," he contrasts his lyricism with that of Common and Talib Kweli (both of whom "rank" higher than him, when it comes to the diversity of their vocabulary):

I dumbed down for my audience to double my dollarsThey criticized me for it, yet they all yell "holla"If skills sold, truth be told, I'd probably beLyrically Talib KweliTruthfully I wanna rhyme like Common SenseBut I did 5 mil - I ain't been rhyming like Common since

Check out the full analysis, complete with several visualizations, here.