I have a dependent variable to classify by a decision tree. It's composed by three categories of frequences: 738 (19%), 426 (15%) and 1800 (66%). As you imagine the predicted category is always the third one, but the purpose of the tree is descriptive so it does not actually matter.
The thing is, when plotting a tree by the ctree() function (package partykit) the terminal nodes display histograms showing the probability of occurrence of the three classes. I need to modify this output: I would like to obtain the proportions of occurrence of each class within the terminal node with respect to the class' absolute frequency.
For example, which percentage of the 738 participants in class1 belongs to a certain terminal node? Each terminal node would display this values for all the three classes that compose the dependent variable.

Bellow a plot of the tree, which by default reports the prevalence of each class within the terminal nodes.

1 Answer
1

You can always define your own panel function to draw what goes into each terminal panel window. If you know a little bit about grid graphics and you look at how the current terminal panel functions are defined you will see how this works.

One panel function that ought to do what you want is node_terminal() in the partykit package (the much improved re-implementation of the old party package). However, because ctree() does not store its predictions in each terminal node, the node_terminal() function cannot do this out of the box at the moment. I'll try to improve the implementation in future versions so that this can be facilitated. Below is a somewhat involved example that should do what you want, I hope.

First, we fit a classification tree using the iris data (for a simple reproducible example):

Then comes the not so obvious part: We want to include these predicted probabilities in the terminal nodes of the tree itself. For this, we coerce the recursive node structure to a flat list, insert the predictions (suitably formatted), and convert the list back to the node structure:

EDIT: The coercing back and forth between a list and a party is actually already implemented in the package...I just forgot about it ;-) If you do

st <- as.simpleparty(ct)

then the resulting party has in each node more detailed information about the predictions etc. For example, the $distribution then contains the absolute frequencies for each response level. This can easily be formatted as before

Thanks a lot @Achim Zeileis for your answer! however, what I meant was the relative proportion of each response value in each terminal node, calculated upon the total sample. For instance, if there were N=150 in the iris dataset and n=50 for each response value (setosa, versicolor and virginica), the proportion to be shown in node 2 would be 100%, because there is 50 cases in node 2, all those available in the whole dataset.
– Gina ZetkinJun 25 '15 at 18:17

You can still do this in the simpleparty solution by replacing prop.table(tab) (within the pred function) by tab/c(50, 50, 50), i.e., showing the proportions to the overall frequencies of the three response classes. Or something similar along these lines.
– Achim ZeileisJun 25 '15 at 22:58