This chapter introduces decision trees and rule systems with the algorithms C5.0, 1R and RIPPER.

Topics covered include:

Understanding decision trees

Example – identifying risky bank loans using C5.0 decision trees

Understanding classification rules

Example – identifying poisonous mushrooms with rule learners

I like that C5.0 is covered as it has been priority for a long time and has only recently been released as open source and made available in R. I am surprised that CART was not covered, the hello world of decision tree algorithms.

Chapter 6: Forecasting Numeric Data – Regression Methods

This chapter is all about regression, with a demonstrations of linear regression, CART and M5P.

Topics covered include:

Understanding Regression

Example – predicting medical expenses using linear regression

Understanding regression trees and model trees

Example – estimating the quality of wines with regression trees and model trees

It is good to see the classics linear regression and CART covered here. M5P is also a nice touch.

It’s not a topic I like much nor an algorithm I have ever had to use on a project. I’d drop this chapter.

Chapter 9: Finding Groups of Data – Clustering with k-means

This chapter introduces he k-means clustering algorithm and demonstrates it on data.

Topics covered include:

Understanding clustering

Example – finding teen market segments using k-means clustering

Another esoteric topic that I would probably drop. Clustering is interesting but often unsupervised learning algorithms are really hard to use well in practice. Here’s some clusters, now what.

Chapter 10: Evaluating Model Performance

This chapter presents methods for evaluating model skill.

Topics covered include:

Measuring performance for classification

Evaluating future performance

I like that performance measures and resampling methods are covered. Many texts skip it. I like that a lot of time is spent on the more detailed concerns of classification accuracy (e.g. touching on Kappa and F1 scores).

Chapter 11: Improving Model Performance

This chapter introduces techniques that you can use to improve the accuracy of your models, namely algorithm tuning and ensembles.

Topics covered include:

Tuning stock models for better performance

Improving model performance with meta-learning

Good but too brief. Algorithm tuning and ensembles are a big part of building accurate models in modern machine learning. Length could be suitable given that it is an introductory text, but more time should be given to the caret package.

If you’re not using caret for machine learning in R, you’re doing it wrong.

Chapter 12: Specialized Machine Learning Topics

This chapter contains a mess of other topics, including:

Working with proprietary files and databases

Working with online data and services

Working with domain-specific data

Improving performance of R

The topics are very specialized. Perhaps only the last on “improving performance of R” is really actionable for your machine learning projects.

Machine Learning Algorithms

The book covers a number of different machine learning algorithms. This section lists all of the algorithms covered and in which chapter they can be found.

I note that page 21 of the book does provide a look-up table of algorithms to chapters, but it is too high-level and glosses over the actual names of the algorithms used.

k-nearest neighbors (chapter 3)

Naive Bayes (chapter 4)

C5.0 (chapter 5)

1R (chapter 5)

RIPPER (chapter 5)

Linear Regression (chapter 6)

Classification and Regression Trees (chapter 6)

M5P (chapter 6)

Artificial Neural Networks (chapter 7)

Support Vector Machines (chapter 7)

Apriori (chapter 8)

k-means (chapter 9)

Bagged CART (chapter 10)

AdaBoost (chapter 10)

Random Forest (chapter 10)

What Do I Think Of This Book?

I like the book as an introduction for how to do machine learning on the R platform.

You must know how to program. You must know a little bit of R. You must have some sense of how to drive a machine learning project from beginning to end. This book will not cover these topics, but it will show you how to complete common machine learning tasks using R.

Set your expectations accordingly:

This is a practical book with worked examples and high-level algorithm descriptions.

This is not a machine learning textbook with theory, proof and lots of equations.

Pros

I like the structured examples how each algorithm is demonstrated with a different dataset.

I like that the datasets are small in memory examples perhaps all taken from the UCI Machine Learning Repository.

I like that references to research papers are provided where appropriate for further reading.

I like the boxes that summarize usage information for algorithms and other key techniques.

I like that it is practically focused, the how of machine learning not the deep why.

Cons

I don’t like that it is so algorithms focused. It general structure of most “applied books” and dumps a lot of algorithms on you rather than the extended project lifecycle.

I don’t like that there are no end-to-end examples (problem definition, through to model selection, through to presentation of results). The formal structure of examples is good, but I’d a deep case study chapter I think.

I cannot download the code and datasets from a GitHub repository or as a zip. I have sign up and go through their process.

There are chapters there that feel like they are only there because similar chapters exist in other machine learning books (clustering and association rules). These may be machine learning methods, but are not used nearly as often as core predictive modeling methods (IMHO).

Perhaps a little too much filler. I like less talk more action. If I want long algorithm description I’d read an algorithms textbook. Tell me the broad strokes and let’s get to it.

Final Word

If you are looking for a good applied book for machine learning with R, this is it. I like it for beginners who know a little machine learning and/or a little R and want to practice machine learning on the R platform.

Even though I think O’Reilly books are generally better applied books than Packt, I don’t see an offering from O’Reilly that can compete.