How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

In this post you will discover how you can create three of the most powerful types of ensembles in R.

This case study will step you through Boosting, Bagging and Stacking and show you how you can continue to ratchet up the accuracy of the models on your own datasets.

Let’s get started.

Build an Ensemble Of Machine Learning Algorithms in RPhoto by Barbara Hobbs, some rights reserved.

Increase The Accuracy Of Your Models

It can take time to find well performing machine learning algorithms for your dataset. This is because of the trial and error nature of applied machine learning.

Once you have a shortlist of accurate models, you can use algorithm tuning to get the most from each algorithm.

Another approach that you can use to increase accuracy on your dataset is to combine the predictions of multiple different models together.

This is called an ensemble prediction.

Combine Model Predictions Into Ensemble Predictions

The three most popular methods for combining the predictions from different models are:

Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset.

Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain.

Stacking. Building multiple models (typically of differing types) and supervisor model that learns how to best combine the predictions of the primary models.

This post will not explain each of these methods. It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles with R.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Ensemble Machine Learning in R

There are three main techniques that you can create an ensemble of machine learning algorithms in R: Boosting, Bagging and Stacking. In this section, we will look at each in turn.

Before we start building ensembles, let’s define our test set-up.

Test Dataset

All of the examples of ensemble predictions in this case study will use the ionosphere dataset.

This is a dataset available from the UCI Machine Learning Repository. This dataset describes high-frequency antenna returns from high energy particles in the atmosphere and whether the return shows structure or not. The problem is a binary classification that contains 351 instances and 35 numerical attributes.

Let’s load the libraries and the dataset.

1

2

3

4

5

6

7

8

9

10

# Load libraries

library(mlbench)

library(caret)

library(caretEnsemble)

# Load the dataset

data(Ionosphere)

dataset<-Ionosphere

dataset<-dataset[,-2]

dataset$V1<-as.numeric(as.character(dataset$V1))

Note that the first attribute was a factor (0,1) and has been transformed to be numeric for consistency with all of the other numeric attributes. Also note that the second attribute is a constant and has been removed.

1. Boosting Algorithms

We can look at two of the most popular boosting machine learning algorithms:

C5.0

Stochastic Gradient Boosting

Below is an example of the C5.0 and Stochastic Gradient Boosting (using the Gradient Boosting Modeling implementation) algorithms in R. Both algorithms include parameters that are not tuned in this example.

We can see that the SVM creates the most accurate model with an accuracy of 94.66%.

1

2

3

4

5

6

7

8

9

10

Models: lda, rpart, glm, knn, svmRadial

Number of resamples: 30

Accuracy

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

lda 0.7714 0.8286 0.8611 0.8645 0.9060 0.9429 0

rpart 0.7714 0.8540 0.8873 0.8803 0.9143 0.9714 0

glm 0.7778 0.8286 0.8873 0.8803 0.9167 0.9722 0

knn 0.7647 0.8056 0.8431 0.8451 0.8857 0.9167 0

svmRadial 0.8824 0.9143 0.9429 0.9466 0.9722 1.0000 0

Comparison of Sub-Models for Stacking Ensemble in R

When we combine the predictions of different models using stacking, it is desirable that the predictions made by the sub-models have low correlation. This would suggest that the models are skillful but in different ways, allowing a new classifier to figure out how to get the best from each model for an improved score.

If the predictions for the sub-models were highly corrected (>0.75) then they would be making the same or very similar predictions most of the time reducing the benefit of combining the predictions.

1

2

3

# correlation between results

modelCor(results)

splom(results)

We can see that all pairs of predictions have generally low correlation. The two methods with the highest correlation between their predictions are Logistic Regression (GLM) and kNN at 0.517 correlation which is not considered high (>0.75).

1

2

3

4

5

6

lda rpart glm knn svmRadial

lda 1.0000000 0.2515454 0.2970731 0.5013524 0.1126050

rpart 0.2515454 1.0000000 0.1749923 0.2823324 0.3465532

glm 0.2970731 0.1749923 1.0000000 0.5172239 0.3788275

knn 0.5013524 0.2823324 0.5172239 1.0000000 0.3512242

svmRadial 0.1126050 0.3465532 0.3788275 0.3512242 1.0000000

Correlations Between Predictions Made By Sub-Models in Stacking Ensemble

Let’s combine the predictions of the classifiers using a simple linear model.

We can see that we have lifted the accuracy to 94.99% which is a small improvement over using SVM alone. This is also an improvement over using random forest alone on the dataset, as observed above.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

A glm ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

Ensemble results:

Generalized Linear Model

1053 samples

5 predictor

2 classes: 'bad', 'good'

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 948, 947, 948, 947, 949, 948, ...

Resampling results

Accuracy Kappa Accuracy SD Kappa SD

0.949996 0.891494 0.02121303 0.04600482

We can also use more sophisticated algorithms to combine predictions in an effort to tease out when best to use the different methods. In this case, we can use the random forest algorithm to combine the predictions.

We can see that this has lifted the accuracy to 96.26% an impressive improvement on SVM alone.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

A rf ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

Ensemble results:

Random Forest

1053 samples

5 predictor

2 classes: 'bad', 'good'

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 948, 947, 948, 947, 949, 948, ...

Resampling results across tuning parameters:

mtry Accuracy Kappa Accuracy SD Kappa SD

2 0.9626439 0.9179410 0.01777927 0.03936882

3 0.9623205 0.9172689 0.01858314 0.04115226

5 0.9591459 0.9106736 0.01938769 0.04260672

Accuracy was used to select the optimal model using the largest value.

The final value used for the model was mtry = 2.

You Can Build Ensembles in R

You do not need to be an R programmer. You can copy and paste the sample code from this blog post to get started. Study the functions used in the examples using the built-in help in R.

You do not need to be a machine learning expert. Creating ensembles can be very complex if you are doing it from scratch. The caret and the caretEnsemble package allow you start creating and experimenting with ensembles even if you don’t have a deep understanding of how they work. Read-up on each type of ensemble to get more out of them at a later time.

You do not need to collect your own data. The data used in this case study was from the mlbench package. You can use standard machine learning dataset like this to learn, use and experiment with machine learning algorithms.

You do not need to write your own ensemble code. Some of the most powerful algorithms for creating ensembles is provided by R, ready to run. Use the examples in this post to get started right now. You can always adapt it to your specific cases or try out new ideas with custom code at a later time.

Summary

In this post you discovered that you can use ensembles of machine learning algorithms to improve the accuracy of your models.

You discovered three types of ensembles of machine learning algorithms that you can build in R:

Boosting

Bagging

Stacking

You can use the code in this case study as a template on your current or next machine learning project in R.

Next Step

Did you work through the case study?

Start your R interactive environment.

Type or copy-paste all of the code in this case study.

Take the time to understand each part of the case study using the help for R functions.

Do you have any questions about this case study or using ensembles in R? Leave a comment and ask and I will do my best to answer.

123 Responses to How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

One question: In the stacked random forest and GLM ensemble models, how are the hyperparameters specified for each model (e.g., which value of k for k Nearest Neighbor, how many trees in the random forest)? I’m assuming default values are used, but can these values be tweaked within the caretEnsemble package?

thanks for this tutorial. I have a question regarding the validation of the models. Should I always use the same seeds (folds) for the whole evaluation process (i.e. tuning the individual models, stacking the models etc.). I think it´s really hard to get valid numbers for the hyperparameters for the individual models, good performance of the individual models and uncorrelated models at the same time. For example, if I use 10 * 10 cross-validation with the same seeds (folds) for all models their performance is quite stable but the intercorrelation of their results tends to be quiet high.

I started following your posts recently and so far this is one of the best post I came across, not just in this website but among all the ML websites I have visited. Really appreciate your efforts to explain such a useful topic in a simple & clear manner. I will definitely try out this technique soon. Thanks again..

The environment is fine as i have tried the other codes of your to as well and those run fine……

Can I send you the dataset i m using on your mail id and you can help me with the code and let me know what mistake i m doing….just to give you a heads up i have also converted all the variables to as.numeric before applying the algorithm but i m still facing the same issue.

Hi Mudit,
I see you are participating in AV Loan default prediction competition. I am also participating in the same in the learning mode and my score (so far) is same as yours. At present i am working on stacking using caretEnsemble, I have not come across the error that you are seeing. Would you be able to provide more info (or) share code.

I need to do combine 6 different predictive models , this is a 7 class problem.
Each model predicts output probability for being in all the seven classes (lib SVMs using -b 1 option) and I need to combine them to get a better model. I just have to use the predictions of these models to train the ensemble.
Can you please tell me how to do the ensemble learning for multiclass problem and how this can be done in matlab ? Please reply as soon as possible. I shall be thankful. Thanks !

Hi Jason,
I have built many models with different pre-processing done on each model and the model itself is unique tuned by its own tune parameters and i want to stack them now. how do i get them into caretStack, because caretStack requires caretList to be its input. but i have generated models using caret train function. hope you got what i am saying. Appreciate your response.

How did you do it? I am currently working on one such problem. Could you advise me on how to go about it. Using a caretlist with different preprocessing, and different variables for each model to build a stacked ensemble.

Hi Jason,
I am currently trying to construct an ensembles of classifiers using the stacking technique. I am currently using the Naive Bayes and j48 classifier as the base leaarners with Random Forest as the meta learner.Where can I find the correlation values for predictions in Weka?
Thanks

what changes shd be made if i deal with a categorical featurespace with large no of levels. on execution if this code shows an error “Error in train.default(x, y, weights = w, …) : Stopping
In addition: There were 33 warnings (use warnings() to see them)”..

Yes, you can create an ensemble by combining the predictions from these models. You can use voting (mode), take the average or use another model to learn how to best combine the predictions (called stacked generalization or stacking).

Do you know where I can find code to implement Stacked Models in R? I am interested in tuning my xgboost models and stacking them with other optimized models. Any pointers towards examples/implementations would be awesome!

I don’t remember the specifics of the paper, but it was along the lines of better performance when predictions between weak learners are uncorrelated (or maybe it was the errors being uncorrelated). This may have been in PAC learning theory – it has been a while, sorry.

You could create 10 neural nets and take the mean prediction – this would probably outperform a stacking approach.

You can put other methods inside bagging (I don’t remember the package name off hand), but bagging works so much better with high variance methods – like unpruned trees or low-k in KNN with subsampled training data, etc.

At the end of the day, try different methods and see what gives the best results on your problem.

Hello Jason, first l would like to thank you for such a nice article of ensemble methods. I have a question that may be naive but i m confused,
My question is that since ensemble methods are used on a base classifer model (naive base classifier, SVM etc.) to improve accuracy of the base model but how to choose on which classifier we should apply ensemble methods.
How do we know that wihch classifier is best for applying ensemble methods?

Hi Jason, really inspired by your post. However, I have one question regarding the output of the stacking model.
“A rf ensemble of 2 base models: lda, rpart, glm, knn, svmRadial”, why there are only 2 base models? 2 is selected by the accuracy?

Thanks Jason, it is a really clear and sound example. I really appreciated. I am starting in this arena, and it is pretty impressive the method.

I found references to a package named SuperLearner in R. I wonder if this package also work for this sort of examples or it has restrictions. I have a quick look at the documentation and I see that mentioned parametric models. Do you anything about SuperLearner?

Just wow Jason. Thank you so much for this very useful tutorial on ensemble methods. I have bought many a book on Machine Learning in R over the last 5 years and I think this is the best summary of how you can use multiple machine learning methods together to enable you to select the best option and the method which is most fit for purpose.

i used knn
and neural net to predict by giving any one code value and in response i got respective values which i trained the machine, my question is is it possible to predict the unknown data on which it has not trained
for example
upto glucose values i trained the machine for 350
if i have given 600 then why it is not giving the relative output it is showng the last i.e nearest neighbour value as a predicted value in neural networks..

Hi jason, thanks for this great work.
i have a question, if i have a data set contains of android code and i need to apply ensemble method on this data set to improve the accuracy how i can apply ensemble techniques can i use weka tool or if u have another way to apply this work.

I am planning to stack different algorithms like Artificial Neural Network, SVM, KNN, C5.0, Random Forest in my research work for predicting diabetes disease…Whether this is feasible?? or whether the library above works only with certain algorithms?

thank you for your advice! And I tried to use stacking,but the following error occurred

Error: wrong model type for regression
In addition: Warning messages:
1: In trControlCheck(x = trControl, y = target) :
x$savePredictions == TRUE is depreciated. Setting to ‘final’ instead.
2: In trControlCheck(x = trControl, y = target) :
indexes not defined in trControl. Attempting to set them ourselves, so each model in the ensemble will have the same resampling indexes.

Good evening mr. Jason.
I want to asking , what ensemble learning bagging-boosting-random forest for logistic regression and linear discriminant. Any specific syntac bagging-boosting-random forest for that classifier?

I am trying different algorithms on classifying cancer as either Malignant or Benign. When I compare these models (LDA, NN, KNN, RF, NB…) using modelCor function in caret package, I get negative correlation coefficient form some models. Is this something possible and how can they (negatively correlated) models be interpreted?

Thanks for the tutorial. It was very much helpful. Can you please suggest how can I plot the ROC curve for a stacked model? Following your example, I have generated stack.svm, now please tell me how can I draw a ROC curve for it?

> models <- caretList(Stroke~., data=d11, trControl=control, methodList=algorithmList)
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
In addition: Warning messages:
1: In trControlCheck(x = trControl, y = target) :
x$savePredictions == TRUE is depreciated. Setting to 'final' instead.
2: In trControlCheck(x = trControl, y = target) :
indexes not defined in trControl. Attempting to set them ourselves, so each model in the ensemble will have the same resampling indexes.