Features

May 7, 2014

There’s a chart by Matt Daniels doing the rounds about relative vocabulary size in hip-hop versus Shakespeare. You’ve probably seen it up on Facebook, or Mashable, or Buzzify/Uplisticle or wherever. It looks pretty neat, but it’s also irritating to look at. Something about the way it quantifies rappers – and compares them to Shakespeare and Moby Dick – seems wrong. The point of the chart is, in part, to make some tendentious point about the lyrical poverty of Shakespeare and the converse richness of Aesop Rock. But a quick, non-exhaustive run-down of the mistaken assumptions that list employs would show it is in fact unfair both to hip-hop and to Shakespeare, and probably also to Melville. It would include some points like this:

1. This chart doesn’t measure vocabulary. (And vocabulary doesn’t measure intelligence, by the way.) It doesn’t show that Drake has a smaller vocabulary than the Insane Clown Posse. A chart like this actually measures redundancy, a correlated but separate concept. Pure redundancy is just repeating the same word over and over; no redundancy is never repeating a word. Rappers with a low redundancy score higher on this chart. A low redundancy score really only measures how hard you work at using synonyms.

2. A wide vocabulary isn’t a necessary condition of great artistry. For the flipside of this argument, see the Lewis Carroll poem ‘Jabberwocky‘.

3. A lot of truly great hiphop relies on double, triple entendres in a single word or phrase. That necessarily reduces vocabulary size but increases complexity (whatever complexity is). See here Kanye’s low scoring. The chart also overlooks all the literary effects that performance can get out of staging or voicing (see Nicki Minaj — or D12. Nicki’s “Barbie” isn’t going to show up in this analysis but that persona allows her a whole other range of expressive options).

4. Hiphop isn’t just about vocabulary, or even about vocabulary and puns and personae. Speed and control and variation of flow are huge factors. You can tell a lot about a rapper’s intelligence, for example, from how they manipulate their flow. I think this is something Kendrick Lamar really excels at. Tech N9ne might be one of the fastest rappers, but there are some moments on ‘Fragile’ when he slows it right down and it’s amazingly effective. Again – that doesn’t show up here.

5. A few words about genre. Poetry comes in all kinds of forms, like lyric, or narrative, or georgic, and these things we are lumping together comparing to ‘drama’? Again, drama of all genres, from tragedy to history to romance. Sidebar, let’s also note the representativeness, or rather, the arbitrariness of the Shakespeare plays chosen. Please. Eminem is pretty much a narrative poet. (See ‘Stan‘ or any number of the tracks in which Em kills Christopher Reeve or Paris Hilton.) So was The Streets.

But that Raekwon track on Cuban Linx II, ‘Baggin Crack‘ would be georgic, the genre that Virgil used to describe the work that farmers did on the land to make their living throughout the year. The same goes for Kanye’s ‘Crack Music‘ on Late Registration. Also on Cuban Linx II, ‘Ason Jones‘ is a beautiful elegy for ODB. See also tributes to Easy-E, or 2Pac, and so on.

And all this is leaving aside love songs, amatory lyrics. A major – and early – achievement of conventional literary criticism, and of corpus-based digital humanities work, was, in both cases, to show that different genres have different, though overlapping, lexical groups. Every one of these genres draws on a loosely bound standard palette of words to achieve its effects. People who stick in one genre, then, are going to have a higher level of redundancy than people who skip around and use different genres. What this charts shows more than anything else might be genre versatility.

6. One final thing on the research side that makes this potentially a really flawed study is skip lists, or cancel lists. We don’t know what cancel lists Daniels used. Language processing programs can’t ‘read’ in the way we do – they don’t know what words are padding for the real event and what words are the thing we’re interested in. So it’s normal when you do this kind of math to give the program a list of words to skip, usually including ‘and’, ‘the’, ‘but’, ‘for’, pronouns, and so on. If you’re using a Shakespeare play, you want to put character names on there, ‘enter’ ‘exit’ and so on. Otherwise you get all that noise throwing off your count of vocabulary. If you’re using Rap Genius, it would make sense to put in ‘chorus’, ‘verse’, ‘hook’, ‘bridge’ and so on. If Daniels didn’t use a skip list, then rappers with longer verses and fewer transitions between the building blocks of tracks will – unfairly – show more favourably.

7. It’s nice to be able to see rappers broken down by region, but it doesn’t show necessarily anything except that there is a weak correlation between, perhaps, Southern rap and, say, a style of rapping with more redundancy and more wordplay. I don’t think it necessarily means much. It would be much more interesting to look at redundancy scores changing over time. That way we could really get an indication for influence: did an album with a low redundancy score come out followed by a lot of artists imitating it? Then that way we might be able to get an idea of what was really influential (versus what we think is influential). Wouldn’t it be great to see if Rakim really was one of the most influential rappers of all time measured by the transformative effect of his lyrics’ redundancy scores? (This points to another problem with the chart – that it is historically flat. It averages everyone out to a single score, rather than showing the variation in their work over many years.)

It’s not that I mind if people want to use the powerful tools of corpus analysis to criticise DMX or Shakespeare. This is an exciting visualisation that has got a lot of people talking. But wouldn’t it be more interesting to see how DMX, or Shakespeare, changed over time, and who they influenced, and how?