If you’re an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.

Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you’ll learn how to analyze sample datasets and write simple machine learning algorithms. Machine Learning for Hackers is ideal for programmers from any background, including business, government, and academic research.

Develop a naïve Bayesian classifier to determine if an email is spam, based only on its text

Use linear regression to predict the number of page views for the top 1,000 websites

Drew Conway

Drew Conway is a PhD candidate in Politics at NYU. He studies international relations, conflict, and terrorism using the tools of mathematics, statistics, and computer science in an attempt to gain a deeper understanding of these phenomena. His academic curiosity is informed by his years as an analyst in the U.S. intelligence and defense communities.

John Myles White

John Myles White is a PhD candidate in Psychology at Princeton. He studies pattern recognition, decision-making, and economic behavior using behavioral methods and fMRI. He is particularly interested in anomalies of value assessment.

The animal on the cover of Machine Learning for Hackers is a griffon vulture (family accipitridae). These considerably large birds hail from the warmer areas of the Old World, namely around the Mediterranean.These birds hatch naked with a white head, broad wings, and short tail feathers. Adult griffon vultures—ranging in size from 37–43 inches long with an average wingspan of 7.5–9.2 feet—are generally a yellowish-brown with variations of black quill feathers and white down surrounding the neck. The griffon vulture is a scavenger, feeding only on prey that is already deceased.The oldest recorded griffon vulture lived to be 41.4 years in captivity. They breed in the mountains of southern Europe, northern Africa, and Asia, laying one egg at a time.The cover image is from Wood’s Animate Creation. The cover font is Adobe ITC Garamond. The text font is Linotype Birka; the heading font is Adobe Myriad Condensed; and the code font is LucasFont’s TheSansMonoCondensed.

I'm reading this as an experienced programmer and hobbyist AI/ML/Statistics guru, and there's just too much that's missing for me to recommend this book. It reads less like "Machine Learning for Hackers" and more like "Statistics for People Who Want to Use R Without Understanding the Fundamentals." I found myself excited at the beginning of the chapter and disappointed by how little actual detail or information was provided beyond "type these commands to get numbers and hope for a good number."

Chapter 12 ("Model Comparison") is a great example of this. While talking about SVMs, this is a snippet of what's provided:

"As you can see from looking at Figure 12-6, the rather complicated decision boundary chosen by the sigmoid kernel wraps around as we change the value of gamma. To really get a better intuition for what's happening, we recommend that you experiment with many more values of gamma than the four we've just shown you."

... Really? To get a better feel for what's happening, I should just try more values for gamma? There is no mention of what, fundamentally, gamma is. The reader is supposed to just try different values and not worry about any details. This is one example, but it is a good example of how disappointed I was near the end of many chapters.

I understand this book is targeted at beginners, but the number of times the author glosses over (or cleverly avoids) actually explaining an incredibly fundamental piece of a chapter leaves the reader wondering if the authors genuinely understand the material themselves.

I'm giving it two stars because it's easy to read, there are decent suggestions as to which R packages are useful for statistical analysis, and because I enjoyed the overfitting examples and graphical depictions early on in the book.

I've long been fascinated by Artificial Intelligence and wanted to get started without knowing where to begin. This is why I picked up this book, thinking this would be a good starting point.Truth be told, this was a good book and gave some insight, but not what I was currently looking for though. So for beginners into AI this is not the starting point.What this book did give me though, was a brush-up on statistics, predictions and an introduction to R. Going through the book the author starts building up knowledge on how to use predictions, estimates, clustering and similar techniques in order to make a machine learn to know what to do next based on previous events. The theory is then backed up by practical using the language R. The one chapter I liked the most was about building a simple recommendation engine on who to follow on Twitter based on your current profile. That sample got through some graph theory combined with clustering models, all summed up with some graphical elements summing up the points going through the chapter.Unfortunately in the end I still felt left in the dark not knowing where to go from here. R seems like a really strong language for performing many types of statistical analysis, but I have yet to see how I should use that in some mainstream application. This is probably due to lack of knowledge on my side, but it just underlines my point about this not being a "beginners" book regarding machine learning.

To summarize it, the author did present basic statistical models that can be used in order to aid machine learning, all this combined with practical examples. But you need to have a higher baseline and previous knowledge about machine learning and ideas about in to utilize it in order to fully enjoy this book.

Machine Learning for Hackers gets you started using R for machine learning. The book does a good job telling you how to install R and where to find help.There are lots examples on how to explore data using ggplot2. Other package covered include plyr which they equal to map reduce. tm package which is used in polynomial regression. glmnet and the Lamda function. K-Nearist neighbor algorithm which uses the class package.

A book heavily focused on the results of code to illustrate concepts takes on a BIG risk of that code being or becoming broken. The UFO example should refer to Unidentified Faulty Objects. Used the online code to work through a few more steps but still ended up with errors, errors that should not occur when cutting and pasting.

It goes through the very basics of statistics to build the necessary knowledge to the machine learning algorithms. On the other hand it doesn't explains in depth shown blocks of code, leaving to the reader to understand particularities of the R programming language. The use of R allows the easy processing of data with few lines of code, on the downside its a very different language so it requires some effort to be understood. For beginners in R, its very valuable to lookup and understand used functions to enlighten used algorithms. This book is truly made for hackers as it requires low level statistics and high level of curiosity to play with code, it also uses real word data on its examples making it even more attractive and fun.

I had high hopes for this book after the first few chapters. The emphasis in the early chapters on cleaning data rings true to anyone who has ever had to deal with a body of real-world data.

But after that it fell into a repetitive pattern: state a problem, give a nontechnical description of a machine learning algorithm, and explain how to call the appropriate ML library in R. With no math and little description of most algorithms, if you want to do something besides use R's built-in libraries, this book isn't so helpful.

The writing style is lively and enjoyable, and the authors picked interesting real-world examples. They probably could write a really good book on machine learning, but this one isn't it.

The general structure of each section is to first introduce a new concept, then demonstrate it by applying the concept to a trivial data set. Next, the technique is applied to a real data set. This structure is a great way to understand a technique.

The complete process of first massaging the data and then determining the technique to apply is covered. Occasionally the author makes a wrong turn and the analysis fails. The demonstration of failure, why it occurs and what to do about it is a great feature of the book.

The book is almost completely lacking in any of the mathematics or workings of the underlying algorithms being used, which may be considered a good or bad thing. Sometimes the book felt more like a tutorial on using R's various machine learning packages, rather than learning about machine learning itself.

If you aren't familiar with R or machine learning, this book presents a significant learning curve. Unfortunately, R's syntax can be quite opaque, even to experienced programmers. Indeed, due to the heavy R component in this book, a better title may have been "Machine Learning with R".

I'm not sure you can "hack" machine learning without properly understanding the underlying concepts, but with this book you can undoubtedly try.

The book presents a relatively quick, somewhat cursory overview of Machine Learning. It provides a good starting point for further study.

When you have enough time on the week-end and want to learn truly some interesting and futuristic concepts in computing. Do read this book followed by working out the examples. If you are serious developers and coding is your passion, then this book will take you to some level up and incite your innovative ideas for your products. For academic people, this should be one of the paper in your course. A very good book from O'Reilly by actual field experienced authors.