ROC Analysis and Performance Curves

For binary scoring classifiers a threshold (or cutoff) value
controls how predicted posterior probabilities are converted into class labels.
ROC curves and other performance plots serve to visualize and analyse the relationship between
one or two performance measures and the threshold.

This page is mainly devoted to receiver operating characteristic (ROC) curves that
plot the true positive rate (sensitivity) on the vertical axis against the false positive rate
(1 - specificity, fall-out) on the horizontal axis for all possible threshold values.
Creating other performance plots like lift charts or precision/recall graphs works
analogously and is shown briefly.

In many applications as, e.g., diagnostic tests or spam detection, there is uncertainty
about the class priors or the misclassification costs at the time of prediction, for example
because it's hard to quantify the costs or because costs and class priors vary over time.
Under these circumstances the classifier is expected to work well for a whole range of
decision thresholds and the area under the ROC curve (AUC) provides a scalar performance
measure for comparing and selecting classifiers.
mlr provides the AUC for binary classification (auc) and also several
generalizations of the AUC to the
multi-class case (e.g., multiclass.au1p, multiclass.au1u based
on Ferri et al. (2009)).

With mlr version 2.8 functions generateROCRCurvesData, plotROCRCurves, and plotROCRCurvesGGVIS
were deprecated.

Below are some examples that demonstrate the three possible ways.
Note that you can only use learners that are capable of predicting probabilities.
Have a look at the learner table in the Appendix
or run listLearners("classif", properties = c("twoclass", "prob")) to get a list of all
learners that support this.

Per default, plotROCCurves plots the performance values of the first two measures passed
to generateThreshVsPerfData. The first is shown on the x-axis, the second on the y-axis.
Moreover, a diagonal line that represents the performance of a random classifier is added.
You can remove the diagonal by setting diagonal = FALSE.

plotROCCurves(df)

The corresponding area under curve (auc) can be calculated as usual by calling
performance.

performance(pred1, auc)
#> auc
#> 0.847973

plotROCCurves always requires a pair of performance measures that are plotted against
each other.
If you want to plot individual measures versus the decision threshold you can use function
plotThreshVsPerf.

In order to compare the performance of the two learners you might want to display the two
corresponding ROC curves in one plot.
For this purpose just pass a named list of Predictions to generateThreshVsPerfData.

Example 2: Benchmark experiment

The analysis in the example above can be improved a little.
Instead of writing individual code for training/prediction of each learner, which can become
tedious very quickly, we can use function benchmark (see also
Benchmark Experiments) and, ideally, the support vector machine
should have been tuned.

We again consider the Sonar data set and apply lda
as well as ksvm.
We first generate a tuning wrapper for ksvm.
The cost parameter is tuned on a (for demonstration purposes small) parameter grid.
We assume that we are interested in a good performance over the complete threshold range
and therefore tune with regard to the auc.
The error rate (mmce) for a threshold value of 0.5 is reported as well.

Per default, generateThreshVsPerfData calculates aggregated performances according to the
chosen resampling strategy (5-fold cross-validation) and aggregation scheme
(test.mean) for each threshold in the sequence.
This way we get threshold-averaged ROC curves.

If you want to plot the individual ROC curves for each resample iteration set aggregate = FALSE.

An alternative to averaging is to just merge the 5 test folds and draw a single ROC curve.
Merging can be achieved by manually changing the class attribute of
the prediction objects from ResamplePrediction to Prediction.

mlr's function asROCRPrediction converts an mlrPrediction object to
a ROCRprediction object, so you can easily generate
performance plots by doing steps 2. and 3. yourself.
ROCR's plot method has some nice features which are not (yet)
available in plotROCCurves, for example plotting the convex hull of the ROC curves.
Some examples are shown below.

Below is the same ROC curve, but we make use of some more graphical parameters:
The ROC curve is color-coded by the threshold and selected threshold values are printed on
the curve. Additionally, the convex hull (black broken line) of the ROC curve is drawn.

In order to create other evaluation plots like precision/recall graphs you just have to
change the performance measures when calling ROCR::performance.
(Note that you have to use the measures provided by ROCR listed here
and not mlr's performance measures.)

If you want to plot a performance measure versus the threshold, specify only one measure when
calling ROCR::performance.
Below the average accuracy over the 5 cross-validation iterations is plotted against the
threshold. Moreover, boxplots for certain threshold values (0.1, 0.2, ..., 0.9) are drawn.