Visualizing books using Zemanta and Wordle

Most of my readers are by now probably already aware of Wordle, Java applet, that allows neat visualization tags. Given that Zemanta released early alpha API preview recently, I was looking for a fun project to showcase some of it.

So for this experiment I’m going to try to visualize some of the popular classic books, using text files of Gutenberg project. Technical details at the bottom of the post.

Jane Austen – Pride and Prejudice

Pride and Prejudice as through words and tagsPride and Prejudice through wordsPride and Prejudice as tags

Herman Melville – Moby-Dick

Moby Dick through wordsMoby Dick through tags

George Orwell – 1984

1984 through words1984 through tags

Technical details

The whole process is done using a simple python script. The script reads in the text file, breaks it into chunks of 360 words, as is roughly one A5 page and then sends it to Zemanta API. It repeats this process for first 30 thousand words of the book. The limit is arbitrary, I just didn’t want to run the script for too long.

Afterward, the text was manually pasted into Wordle and I played with random function and details until I started to like the image. You can also take a look at my full Wordle gallery.

Lessons learned

I’m especially happy how 1984 turned out. For this kind of visualization it’s important to choose source carefuly, so you can get more powerful results that way. I’ll probably continue experimenting with this on 1984 text.