Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

The setup is simple: binary classification using a simple decision tree, each node of the tree has a single threshold applied on a single feature. In general, building a ROC curve requires moving a decision threshold over different values and computing the effect of that change on the true positive rate and the false positives rate of predictions. What's that decision threshold in the case of a simple fixed decision tree?

1 Answer
1

In order to build the ROC curve and AUC (Area under curve) you have to have a binary classifier which provides you at classification time, the distribution (or at least a score), not the classification label. To give you an example, suppose you have a binary classification model, with classes c1 and c2. For a given instance, your classifier would have to return a score for c1 and another for c2. If this score is a probability-like (preferrable), than something like p(c1), p(c2) would work. In plain English is translated like "I (the model) classify this instance as c1 with probability p(c1), and as c2 with probability p(c2)=1-p(c1)".

This applies for all type of classifiers, not only for decision trees. Having these scores you can than compute ROC or AUC by varying a threshold on p(c1) values, from the smallest to the greatest value.

Now, if you have an implementation of a decision tree and you want to change that implementation to return scores instead of labels you have to compute those values. The most used way for decision trees is to use the proportion of classes from the leaf nodes. So, for example you have built a decision tree and when you predict the class for an instance you arrive at a leaf node which have (stored from the learning phase) 10 instances of class c1 and 15 instances of class c2, you can use the ratios as the scores. So, in our example, you would return p(c1) = 10 / (10+15) = 0.4 probability of class c1 and p(c2) = 15/(10+15)=0.6 probability of being class c2.

For further reading on the ROC curves, the best and inspiring source of information I found to be the Tom Fawcett paper called An Introduction to ROC Analysis, it's solid gold on this topic.