Pete Cliff tries to remember A-level mathematics as he dives into the fascinating world of machine learning and statistics and how to apply these techniques to Web-accessible datasets.

When I was handed this book for review a colleague of mine said "rather you than me" and there is no doubt that Programming Collective Intelligence is probably not a book for everyone. However, if phrases like 'Bayesian filtering', 'Support-vector machines', 'Collaborative filtering' and 'Methods of clustering' do not deter you or better, engage your interest, then this work is well worth a look.

One of the pleasing things to notice when picking up the book is its relatively small size and Toby Segaran has managed to condense a series of complex techniques into just eleven chapters of concise and interesting writing. The writing is packed full of information, without wasting any words; and sometimes this can be hard (but good) going. The author suggests at the start that 'some knowledge of trigonometry and basic statistics' will be helpful in understanding the algorithms. However, he does a good job of explaining the techniques and even if you do not fully understand what is going on (keep at it and you will!), you are still left with some useful, working code to add to your applications.

As regards not fully understanding: the writing style, out of necessity, can leave the reader asking 'why?' or 'how?' when a complex mathematical idea is presented as a solution, leaving the rationale unexplained. However, there is enough in the text to suggest further avenues of investigation and exploring the exercises at the end of each chapter can help. Moreover, spending some time picking apart the Python code also repays the effort.

But there perhaps we broach an issue and a word of warning. In many cases the explanation of the algorithm is tied closely to the Python code (available from the author's blog [1]), so lack of familiarity with Python might present a problem. That said, if the reader is familiar with other programming languages and has a good grasp of programming principles then the code samples are understandable with a little work. There are even efforts on the Web to port the code samples to other languages [2] and I would recommend this approach to anyone who really wants to understand the algorithms presented. I have long been dismissive of Python, but through this book even my hard heart began to soften at the facility and usefulness of the language!

The chapters build steadily in complexity: Chapter 1 introduces 'Collective Intelligence' and then we are off on a functional tour of algorithms for achieving specific goals:

Making Recommendations (Chapter 2)

Discovering Groups (Chapter 3)

Searching and Ranking (Chapter 4)

Optimization (Chapter 5)

Document Filtering (Chapter 6)

Modelling with Decision Trees (Chapter 7)

Building Price Models (Chapter 8)

Kernel Methods and SVMs (Support Vector Machines)(Chapter 9)

and it should be clear from this list that this book has significant relevance to our sector.

One of the nicest things about the way the chapters progress is that they start with the theory, perhaps working on a sample dataset, and then present the reader with a real-world example. For example, 'Making Recommendations' (Chapter 2) demonstrates use of the del.icio.us API while Chapter 8 shows how to draw data from eBay.

The penultimate two chapters of the book change tack, with Chapter 10 examining ways of discovering features in a dataset (which then form the basis of classification) and Chapter 11 taking you into the fascinating (and astonishing) world of genetic programming. The final chapter is very useful as it summarises the algorithms discussed in the rest of the book and is designed as the starting point for any new problem the reader might encounter to which these 'Collective Intelligence' techniques can be applied.

The book closes with two useful appendices, Appendix A summarising the Python libraries used throughout the book, including simple installation instructions and certainly enough to enable the reader to make use of the code samples in the book. Appendix B is a useful summary of the mathematical formulas used in the book but if strange mathematical symbols scare you, may best be avoided.

Programming Collective Intelligence not only delivers, but manages to deal with a dense subject in an interesting way, providing a successful mix of theory and practical application thanks to the consistent use of real-world examples. In short, if its subject lies within your field of interest, this work is an excellent addition to your professional bookshelf and may even convince you (as it did me) to take another look at Python!