Less is more

Derek pointed me to the style.org site which also parses political speeches. Their preferred graphic is not the tag cloud but a labeled bar chart.

From top to bottom, each bar represents a sentence; the length of each bar is the length of each sentence. Further, the user can specify word pairs for comparison. Here the red bars are sentences containing the word "freedom"; the blue bars, "security".

It's a good illustration of the "small multiples" principle in constructing comparative graphics.

However, the choice of dimensions is perplexing. I'd be much more interested in the timing of mentions of those words, rather than which sentence they appeared in. I also find the length of each sentence to be irrelevant.

Here's one concept that brings out the point better. It uses less space and voluntarily gives up some of the data (the sentence structure).

Comments

I'm going to have to disagree with you on this one. While I think your "point-line" graphics might be fine for "sparkline" type of applications, revealing the overall text structure has its uses.

A variant on this style.org model is to highlight individual words and n-grams (phrases of length "n"), again embedding them within a larger text structure.

We've found (there are several projects on such approaches here at UM) that this can reveal very interesting patterns within and across texts, much of which I fear would be lost by minimizing the text structure itself.

Ken: My comment only applies to this particular case: Clinton's speech got an extra-long red bar because he mentioned the word "freedom" in a sentence that happened to be one of the longest. There are many examples where text structure would be very useful. If you have examples, do point us to them!