R code to accompany Real-World Machine Learning (Chapter 3)

Abstract

The rwml-R Github repo is updated with R code to accompany Chapter 3 of the book “Real-World Machine Learning” by Henrik Brink, Joseph W. Richards, and Mark Fetherolf.

Survivors on the Titanic

The Titanic Passengers dataset is used to illustrate various processes used
to prepare data for modeling, including
conversion of factor variables to dummy variables. For example, the code
to produce the
following table of processed data is provided:

Survived.yes

Pclass

Sex.male

Age

SibSp

Parch

Embarked.Q

Embarked.S

sqrtFare

0

3

1

22

1

0

0

1

2.692582

1

1

0

38

1

0

0

0

8.442944

1

3

0

26

0

0

0

1

2.815138

1

1

0

35

1

0

0

1

7.286975

0

3

1

35

0

0

0

1

2.837252

0

3

1

-1

0

0

1

0

2.908316

I also go “off-script” a bit (do some things not contained in the book) and
demonstrate some useful visualization, modeling, and performance
measuring techniques available with thecaret and AppliedPredictiveModeling packages.

MNIST database of handwritten digits

A k-nearest neighbors classifier (from the kknn package) is used to
predict the numbers represented in the MNIST database of handwritten digits.
Examples of the types of digits present in the dataset and the R code to
display them:

Auto MPG dataset

As an example of a linear regression analysis, the Auto MPG dataset introduced
in Chapter 2 resurfaces and fuel economy is predicted from origin, year of
production, and performance characteristics such as horsepower and engine
displacement.

As always, feedback is welcome

As always, I’d love to hear from you if you find the project helpful or if you
have any suggestions. Please leave a comment below or use the Tweet button.
Also, feel free to fork the rwml-R repo
and submit a pull request if you wish to contribute.