Additive Groves is an ensemble of regression trees developed by Daria Sorokina,
Rich Caruana and Mirek Riedewald.

Feature evaluation technique referred as "multiple counts" is developed by Art Munson
and all of the above.

All code is written by Daria Sorokina unless stated otherwise. The code along with executable binaries is available on GitHub
under BSD license and is free to use for any purpose. (It also makes use of external libraries available under LGPLv2.1 license.)

TreeExtra is maintained for both Windows and Linux platforms. There is no support for OS X/macOS systems at this time.

Please e-mail me any comments, suggestions, bug reports or feature requests. I am
interested in how my algorithm is doing: if you have successfully (or unsuccessfully)
applied Additive Groves to your data, I'd be happy to hear about your experience
with it.

Feature evaluation algorithm in bagged trees is changed. Now a score in each node is normalized by the entropy of the split feature in that node. This way the scores of binary features become comparable with the scores of continuous features with multiple values.

Effect and interaction plots now consider all data, including data points with missing values. For the features with substantial number of missing values, the effect of missing value is also plotted.

13 Aug '13. TreeExtra 2.3 is released

Tree-building algorithm is modified towards building more balanced trees. Up to 10% improvement in predictive performance on some data sets. Note that the
best values of parameters might differ from those produced by previous versions.

Linux version is now making use of multithreading and trains different branches of a tree in parallel. The running time for training decreased 1.5 times
on average.

A major issue is fixed in the Windows version. It is possible now to train good models on data sets larger than 32,000 data points. (Linux version did not
have this problem.)

One of bt_train output files is renamed from features.txt to feature_scores.txt to decrease a chance of conflict with the input
data file name.

21 Apr '12. TreeExtra 2.2 is released

Tree training is now faster without any impact on performance

Treatment of missing values is improved: both probabilistic and "missing value is a separate value" approaches are evaluated in every split.

N - number of trees in a grove - is now increased exponentially instead of linearly. It takes on values 1,2,3,4,6,8,11,16,23,32,45,64,... - major running
time savings on data sets with strong additive structure.

Several tweaks on the original Additive Groves algorithm result in further performance improvement. Namely, in most cases convergence test and vertical vs
horizontal step tests are made on training instead of validation data now.

Daria Sorokina.Modeling Additive Structure and Detecting Interactions with Additive Groves of Regression Trees
CMU Machine Learning Lunch, March 2010Video (You need to scroll down to March 1 2010 talk. Sound is bad for the first few minutes only.)
Slides (.ppt)