Notes on Gradient Boosting vs. TreeBoost:
- This implementation is for Stochastic Gradient Boosting, not for TreeBoost.
- Both algorithms learn tree ensembles by minimizing loss functions.
- TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes
based on the loss function, whereas the original gradient boosting method does not.
- When the loss is SquaredError, these methods give the same result, but they could differ
for other loss functions.

validationInput - Validation dataset.
This dataset should be different from the training dataset,
but it should follow the same distribution.
E.g., these two datasets could be created from an original dataset
by using org.apache.spark.rdd.RDD.randomSplit()