NEWS file for the rpart package

Changes in version 4.1-14

Changed example data solder to solder.balance. The full
version of the data is available in the survival package.

Changes in version 4.1-10

Rpart would fail with a formula having ~. - x on the right hand
side. A simple bookkeeping error in creating an index.

Added a section to the vignette on user written functions,
which explains why and when one can avoid checking all 2^k
splits for a categorical predictor with k levels.

Changes in version 4.1-0

The C and R code has been reformatted for legibility.

The old compatibility function rpconvert() has been removed.

The cross-validation functions allow for user interrupt at the
end of evaluating each split.

Variable Reliability in data set car90 is
corrected to be an ordered factor, as documented.

Surrogate splits are now considered only if they send two or
more cases with non-zero weight each way. For
numeric/ordinal variables the restriction to non-zero weights is
new: for categorical variables this is a new restriction.

Surrogate splits which improve only by rounding error over the
default split are no longer returned. Where weights and missing
values are present, the splits component for some of these
was not returned correctly.

Changes in version 4.0-3

A fit of class "rpart" now contains a component for
variable ‘importance’, which is reported by the
summary() method.

The handling of fits with zero and fractional weights has been
corrected: the results may be slightly different (or even
substantially different when the proportion of zero weights is
large).

Some memory leaks have been plugged.

There is a second vignette, ‘longintro.Rnw’, a version of
the original Mayo Tecnical Report on rpart.

Changes in version 4.0-2

Added dataset car90, a corrected version of the
S-PLUS dataset car.all (used with permission).

This version does not use paste0{} and so works
with R 2.14.x.

Changes in version 4.0-1

Merged in a set of Splus code changes that had accumulated at
Mayo over the course of a decade. The primary one is a change in how
indexing is done in the underlying C code, which leads to a major
speed increase for large data sets. Essentially, for the lower
leaves all our time used to be eaten up by bookkeeping, and this was
replaced by a different approach. The primary routine also uses
.Call{} so as to be more memory efficient.

The other major change was an error for asymmetric loss
matrices, prompted by a user query. With L=loss asymmetric, the
altered priors were computed incorrectly – they were using L'
instead of L. Upshot – the tree would not not necessarily choose
optimal splits for the given loss matrix. Once chosen, splits were
evaluated correctly. The printed “improvement” values are of
course the wrong ones as well. It is interesting that for my little
test case, with L quite asymmetric, the early splits in the tree are
unchanged – a good split still looks good.

Add the return.all argument to xpred.rpart().

Added a set of formal tests, i.e., cases with known answers to
which we can compare.