This book seems almost written for me. It starts of with my favourite platform: Twitter. Next it covers micro formats, which I was just getting interested in. And finally, working with similarity and clustering released a slew of new ideas.

Wht is interesting to note is that this book references a lot of material. This means that it contains more good stuff than fits between the covers, but also that some of the stuff you want to know cannot be found between the covers.

All code used in the book (and more) can be found on github, which saves a lot of typing when you want to play around. The downside is that some longer examples and utilities are only on github, which is sub-ideal when you are reading away from the internet or the computer.

The code used in the book is written in Python, arguably because of its readable syntax and library support, especially the NLTK.

The book also uses Redis and CouchDB extensively, which is not so easily justified at this small scale in my opinion. Later in the book, Pickle is used most of the time.

This book covers a lot of ground in this broad subject, and gives you the tools to explore subjects in-depth yourself. Definitely recommended reading.