All posts tagged visualization

Someone recently asked on twitter about about peoples' preferences for cloud generators in R.
I replied that I thought the "null" word cloud generator was best. By this I mean that I think the word cloud is a bad visualization method.
Why? Here is one article with a good perspective, but you can search for examples and see what insights you can get from word clouds; I think they usually obscure the insights. If you are trying to understand raw text then you really want to do better text mining rather than just word frequencies. And if you want to just look at term frequencies, the word cloud is a very fuzzy way to go about it.
So the natural followup question is how to plot phrase/word frequency data.
Here is an example of the kind of thing that I usually do. This is only for raw term frequency data (you will need to tabulate it yourself first, which is easy). For real text mining analysis you can always use packages from the CRAN Task View.
library(languageR)
# get english word freq data
data(english)
df <- english[,c("Word","WrittenFrequency")]
#reorder by freq for plotting
df <- df[order(-df$WrittenFrequency),]
df$Word <- reorder(df$Word,1:NROW(df))
#get the top 75 words
df <- head(df,75)
library(ggplot2)
# frequency label on the yaxis # x axis is frequency scale (log data in this example) # word name is shown in the facet label
p <- ggplot(df,aes(x=WrittenFrequency,y=WrittenFrequency))
p <- p + geom_point(size=5)
p + facet_grid(Word~.,scales="free") + opts(strip.text.y = theme_text(),axis.title.y= theme_blank())
There are lots of things you can do to make it fancier and prettier. Does anyone have something better?