In the previous article, we've talked about AdaBoost which combines output of weak learners into a weighted sum that represents the final output of the boosted classifier. If you know little about AdaBoost or additive model, we highly recommend you read the article first.

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

Boosting Tree

Boosting tree is based on additive model which can be repsentated as following:

\(\begin{align}f_M(x) = \sum_{m=1}^{M} T(x; \theta_m)\end{align}\)

where \(T(x; \theta_m)\) stands for a decision tree, \(M\) is the number of decision trees and \(\theta_m\) is the parameters of the m-th decision tree.

Assume that the initial boosting tree \(f_0(x) = 0\) the m-th step of this model is

\(\begin{align}f_m(x) = f_{m-1}(x) + T(x; \theta_m)\end{align}\)

where \(f_{m-1}(x)\) is model currently. The parameters of the next decision tree \(\theta_m\) can be deterimined by minimizing the following cost function:

where \(c_j\) is a constant weight of the region, \(\theta = \{(R_1, c_1), (R_2, c_2), \dots, (R_J, c_J)\}\) stands for the separation of input space, and \(J\) is the number of leaf nodes of the regression tree.