Update March/2018: Added alternate link to download the dataset as the original appears to have been taken down.

XGBoost With Python Mini-CoursePhoto by Teresa Boardman, some rights reserved.

(Tip: you might want to print or bookmark this page so that you can refer back to it later.)

Who Is This Mini-Course For?

Before we get started, let’s make sure you are in the right place. The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly, you might just need to brush up in one area or another to keep up.

Developers that know how to write a little code. This means that it is not a big deal for you to get things done with Python and know how to setup the SciPy ecosystem on your workstation (a prerequisite). It does not mean your a wizard coder, but it does mean you’re not afraid to install packages and write scripts.

Developers that know a little machine learning. This means you know about the basics of machine learning like cross validation, some algorithms and the bias-variance trade-off. It does not mean that you are a machine learning PhD, just that you know the landmarks or know where to look them up.

This mini-course is not a textbook on XGBoost. There will be no equations.

It will take you from a developer that knows a little machine learning in Python to a developer who can get results and bring the power of XGBoost to your own projects.

Mini-Course Overview (what to expect)

Each lesson was designed to take the average developer about 30 minutes. You might finish some much sooner and others you may choose to go deeper and spend more time.

You can complete each part as quickly or as slowly as you like. A comfortable schedule may be to complete one lesson per day over a one week period. Highly recommended.

The topics you will cover over the next 7 lessons are as follows:

Lesson 01: Introduction to Gradient Boosting.

Lesson 02: Introduction to XGBoost.

Lesson 03: Develop Your First XGBoost Model.

Lesson 04: Monitor Performance and Early Stopping.

Lesson 05: Feature Importance with XGBoost.

Lesson 06: How to Configure Gradient Boosting.

Lesson 07: XGBoost Hyperparameter Tuning.

This is going to be a lot of fun.

You’re going to have to do some work though, a little reading, a little research and a little programming. You want to learn about XGBoost right?

(Tip: Help for with these lessons can be found on this blog, use the search feature.)

Any questions at all, please post in the comments below.

Share your results in the comments.

Hang in there, don’t give up!

Lesson 01: Introduction to Gradient Boosting

Gradient boosting is one of the most powerful techniques for building predictive models.

The idea of boosting came out of the idea of whether a weak learner can be modified to become better. The first realization of boosting that saw great success in application was Adaptive Boosting or AdaBoost for short. The weak learners in AdaBoost are decision trees with a single split, called decision stumps for their shortness.

AdaBoost and related algorithms were recast in a statistical framework and became known as Gradient Boosting Machines. The statistical framework cast boosting as a numerical optimization problem where the objective is to minimize the loss of the model by adding weak learners using a gradient descent like procedure, hence the name.

The Gradient Boosting algorithm involves three elements:

A loss function to be optimized, such as cross entropy for classification or mean squared error for regression problems.

A weak learner to make predictions, such as a greedily constructed decision tree.

An additive model, used to add weak learners to minimize the loss function.

New weak learners are added to the model in an effort to correct the residual errors of all previous trees. The result is a powerful predictive modeling algorithm, perhaps more powerful than random forest.

In the next lesson we will take a closer look at the XGBoost implementation of gradient boosting.

Lesson 02: Introduction to XGBoost

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.

XGBoost stands for eXtreme Gradient Boosting.

It was developed by Tianqi Chen and is laser focused on computational speed and model performance, as such there are few frills.

In addition to supporting all key variations of the technique, the real interest is the speed provided by the careful engineering of the implementation, including:

Parallelization of tree construction using all of your CPU cores during training.

Distributed Computing for training very large models using a cluster of machines.

Out-of-Core Computing for very large datasets that don’t fit into memory.

Cache Optimization of data structures and algorithms to make best use of hardware.

Traditionally, gradient boosting implementations are slow because of the sequential nature in which each tree must be constructed and added to the model.

The on performance in the development of XGBoost has resulted in one of the best predictive modeling algorithms that can now harness the full capability of your hardware platform, or very large computers you might rent in the cloud.

As such, XGBoost has been a cornerstone in competitive machine learning, being the technique used to win and recommended by winners. For example, here is what some recent Kaggle competition winners have said:

As the winner of an increasing amount of Kaggle competitions, XGBoost showed us again to be a great all-round algorithm worth having in your toolbox.

XGBoost models can be used directly in the scikit-learn framework using the wrapper classes, XGBClassifier for classification and XGBRegressor for regression problems.

This is the recommended way to use XGBoost in Python.

Download the Pima Indians onset of diabetes dataset from the UCI Machine Learning repository (update: download from here). It is a good test dataset for binary classification as all input variables are numeric, meaning the problem can be modeled directly with no data preparation.

We can train an XGBoost model for classification by constructing it and calling the model.fit() function:

1

2

model=XGBClassifier()

model.fit(X_train,y_train)

This model can then be used to make predictions by calling the model.predict() function on new data.

24 Responses to 7 Step Mini-Course to Get Started with XGBoost in Python

Questions on:
1. If the system is overlearning, decrease the learning rate and/or increase the number of trees.
2. If the system is underlearning, speed the learning up to be more aggressive by increasing the learning rate and/or decreasing the number of trees.

I think If the system is overlearning, that means it is overffing, and I think is should increase the learning rate the decrease the depth of trees and decrease the number of trees.

In lesson 4, the first block shows eval_metric=”error” and then eval_metric=”logloss”. When using “error”, I got minimum in iteration [89] validation_0-error:0.204724. When using “logloss”, I got minimum in iteration [32] validation_0-logloss:0.487297 . why the minimum in these two eval_metric occurs at different iteration? which one we should follow? Thank you.

Thank you for all these great tutorials! They are fantastic. Thank you for passing on this knowledge in a very easily digestable manner. 🙂

I am running on a Windows 10 system using Anaconda Python 3.6 using Spyder3 as an editor. I keep getting this error:

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using “if __name__ == ‘__main__'”. Please see the joblib documentation on Parallel for more information

I looked it up on stackoverflow but I’m still not clear on how to fix this. I tried putting everything in a “if __name__ == ‘__main__’:” block but I still get this error. Two questions:

1: Is there a way to get this code to work on a non-GPU system? My system is an Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz, 16.0 GB of RAM, 64-bit Operating system, x64 based processor

2: If I ran this on a Windows 10 system with an ND-series GPU, would it fix this problem?

I was playing around with the parameters a little and finally got it to run by setting n_jobs = 1 instead of n_jobs = -1 in the call to GridSearchCV. From what I read online, it is because Windows doesn’t have a fork function and so it doesn’t know how to handle the process requested when n_jobs is set to -1.

Thanks again. I’ve learned so much! There’s nothing like learning through example and just getting in there and getting your hands dirty.

Hi Jason, I’m using an XGBRegressor model to predict the severity of insurance claims.
However, the prediction is nearly the same for every record.
I assume this is overfitting or what do you recommend to resolve the issue ?

I have the XGBOOST book and it is fantastic. It has helped immensely to classify our sick member population. However, I am stuck with feature name mismatch error when I use OneHotEncoder(). The problem doesn’t happen when train and test the data-set. It only happens when I try to use the model to predict a new data-set that the model hasn’t seen. I posted the question here: https://stackoverflow.com/questions/51860759/xgboost-feature-name-error-python

I am hoping that you could also give some ideas to overcome this issue.

Thank you so much for this awesome answer. I was able to solve the issue and also mentioned about this website on stackoverflow.com.

For folks who are in the same situation with me, you need to import joblib from sklearn.externals. After you fit the onehotencoder, you need to save the fitted encoder by using joblib.dump(enc, “your_encoder.pkl”).

After tuning the parameters in the last step, lets say we found the best parameters among the candidates. Do we have to train the model again with these parameters ? If so, how do we feed to the classifier the best parameters? Thanks.