Algorithms To Spot Check in R

I would recommend exploring many of them, especially, if making accurate predictions on your dataset is important and you have the time.

Often you don’t have the time, so you need to know the few algorithms that you absolutely must test on your problem.

In this section you will discover the linear and nonlinear algorithms you should spot check on your problem in R. This excludes ensemble algorithms such as as boosting and bagging, that can come later once you have a baseline.

Each algorithm will be presented from two perspectives:

The package and function used to train and make predictions for the algorithm.

The caret wrapper for the algorithm.

You need to know which package and function to use for a given algorithm. This is needed when:

You are researching the algorithm parameters and how to get the most from the algorithm.

You have a discovered the best algorithm to use and need to prepare a final model.

You need to know how to use each algorithm with caret, so that you can efficiently evaluate the accuracy of the algorithm on unseen data using the preprocessing, algorithm evaluation and tuning capabilities of caret.

Two standard datasets are used to demonstrate the algorithms:

Boston Housing dataset for regression (BostonHousing from the mlbench library).

Linear Algorithms that are simpler methods that have a strong bias but are fast to train.

Nonlinear Algorithms that are more complex methods that have a large variance but are often more accurate.

Each recipe presented in this section is complete and will produce a result, so that you can copy and paste it into your current or next machine learning project.

Let’s get started.

Linear Algorithms

These are methods that make large assumptions about the form of the function being modeled. As such they are have a high bias but are often fast to train.

The final models are also often easy (or easier) to interpret, making them desirable as final models. If the results are suitably accurate, you may not need to move onto non-linear methods if a linear algorithm.

1. Linear Regression

The lm() function is in the stats library and creates a linear regression model using ordinary least squares.

Nonlinear Algorithms

Thees are machine learning algorithms that make fewer assumptions about the function being modeled. As such, they have a higher variance but are often result in higher accuracy. They increased flexibility also can make them slower to train or increase their memory requirements.

1. k-Nearest Neighbors

The knn3 function is in the caret library and does not create a model, rather makes predictions from the training set directly. It can be used for classification or regression.

Classification Example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

# knn direct classification

# load the libraries

library(caret)

library(mlbench)

# Load the dataset

data(PimaIndiansDiabetes)

# fit model

fit<-knn3(diabetes~.,data=PimaIndiansDiabetes,k=3)

# summarize the fit

print(fit)

# make predictions

predictions<-predict(fit,PimaIndiansDiabetes[,1:8],type="class")

# summarize accuracy

table(predictions,PimaIndiansDiabetes$diabetes)

Regression Example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

# load the libraries

library(caret)

library(mlbench)

# load data

data(BostonHousing)

BostonHousing$chas<-as.numeric(as.character(BostonHousing$chas))

x<-as.matrix(BostonHousing[,1:13])

y<-as.matrix(BostonHousing[,14])

# fit model

fit<-knnreg(x,y,k=3)

# summarize the fit

print(fit)

# make predictions

predictions<-predict(fit,x)

# summarize accuracy

mse<-mean((BostonHousing$medv-predictions)^2)

print(mse)

The knn implementation can be used within the caret train() function for classification as follows:

This is a nice overview. 2 minor suggestions:
1. You might want to add a section/additional post on how to easily compare between the various models, to determine which is best.
2. For simplicity of your readers, you might attach a single file with all of the calls (rather than needing to copy from each of the little windows.)