$\begingroup$The expression "Searched high and low" is interesting since you can find plenty of excellent definitions/uses for AUC by typing "AUC" or "AUC statistics" into google. Appropriate question of course, but that statement just caught me off guard!$\endgroup$
– BehacadJan 9 '15 at 23:03

3

$\begingroup$I did Google AUC but a lot of the top results didn't explicitly state AUC = Area Under Curve. The first Wikipedia page related to it does have it but not until half way down. In retrospect it does seem rather obvious! Thank you all for some really detailed answers$\endgroup$
– joshJan 12 '15 at 12:13

Computing the AUROC

Assume we have a probabilistic, binary classifier such as logistic regression.

Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:

We predict 0 while the true class is actually 0: this is called a True Negative, i.e. we correctly predict that the class is negative (0). For example, an antivirus did not detect a harmless file as a virus .

We predict 0 while the true class is actually 1: this is called a False Negative, i.e. we incorrectly predict that the class is negative (0). For example, an antivirus failed to detect a virus.

We predict 1 while the true class is actually 0: this is called a False Positive, i.e. we incorrectly predict that the class is positive (1). For example, an antivirus considered a harmless file to be a virus.

We predict 1 while the true class is actually 1: this is called a True Positive, i.e. we correctly predict that the class is positive (1). For example, an antivirus rightfully detected a virus.

To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:

In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.

Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:

True positive rate (TPR), aka. sensitivity, hit rate, and recall, which is defined as $ \frac{TP}{TP+FN}$. Intuitively this metric corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. In other words, the higher TPR, the fewer positive data points we will miss.

False positive rate (FPR), aka. fall-out, which is defined as $ \frac{FP}{FP+TN}$. Intuitively this metric corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. In other words, the higher FPR, the more negative data points will be missclassified.

To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example $0.00; 0.01, 0.02, \dots, 1.00$) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.

The following figure shows the AUROC graphically:

In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.

$\begingroup$Brilliant explanation. Thank you. One question just to clarify that I understand: am I right in saying that, on this graph, a solid blue square would have ROC curve (AUC=1) and would be a good prediction model? I assume this is theoretically possible.$\endgroup$
– joshJan 12 '15 at 12:39

20

$\begingroup$@josh Yes, that's right. The AUROC is between 0 and 1, and AUROC = 1 means the prediction model is perfect. In fact, further away the AUROC is from 0.5, the better: if AUROC < 0.5, then you just need to invert the decision your model is making. As a result, if AUROC = 0, that's good news because you just need to invert your model's output to obtain a perfect model.$\endgroup$
– Franck DernoncourtJan 12 '15 at 17:08

Although I'm a bit late to the party, but here's my 5 cents. @FranckDernoncourt (+1) already mentioned possible interpretations of AUC ROC, and my favorite one is the first on his list (I use different wording, but it's the same):

the AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, i.e. $P\Big(\text{score}(x^+) > \text{score}(x^-)\Big)$

Consider this example (auc=0.68):

Let's try to simulate it: draw random positive and negative examples and then calculate the proportion of cases when positives have greater score than negatives

$\begingroup$+1 (from before). Above I linked to another thread where you made a very nice contribution to a related topic. This here does a great job complimenting @FranckDernoncourt's post & helping to flesh it out further.$\endgroup$
– gung♦Jan 16 '15 at 21:12

1

$\begingroup$In the ROC curve produced by the R package, What does the color stands for ? Can you please add some details to it. Thanks !$\endgroup$
– PrradepMar 3 '16 at 1:00

$\begingroup$It would probably be useful to add true positives and true negatives to the explanation in the grey box above? Otherwise it may be a bit confusing.$\endgroup$
– cbelleiJan 25 '17 at 15:28

Important considerations are not included in any of these discussions. The procedures discussed above invite inappropriate thresholding and utilize improper accuracy scoring rules (proportions) that are optimized by choosing the wrong features and giving them the wrong weights.

Dichotomization of continuous predictions flies in the face of optimal decision theory. ROC curves provide no actionable insights. They have become obligatory without researchers examining the benefits. They have a very large ink:information ratio.

Optimum decisions don't consider "positives" and "negatives" but rather the estimated probability of the outcome. The utility/cost/loss function, which plays no role in ROC construction hence the uselessness of ROCs, is used to translate the risk estimate to the optimal (e.g., lowest expected loss) decision.

The goal of a statistical model is often to make a prediction, and the analyst should often stop there because the analyst may not know the loss function. Key components of the prediction to validate unbiasedly (e.g., using the bootstrap) are the predictive discrimination (one semi-good way to measure this is the concordance probability which happens to equal the area under the ROC but can be more easily understood if you don't draw the ROC) and the calibration curve. Calibration validation is really, really necessary if you are using predictions on an absolute scale.

$\begingroup$Every other answer focuses on mathematical formulas which have no practical usefulness. And the only correct answer has the least upvotes.$\endgroup$
– maxMay 4 '16 at 23:00

5

$\begingroup$I have been at the receiving end of seemingly cryptic answers on this topic from Professor Harrell - they are great in the way that they force you to think hard. What I believe he is hinting at is that you don't want to accept false negative cases in a screening test for HIV (fictional example), even if accepting a higher percentage of false negatives (concomitantly reducing false positives) could place your cutoff point at the AUC maxima. Sorry for the brutal oversimplification.$\endgroup$
– Antoni ParelladaSep 17 '16 at 15:24

AUC is an abbrevation for area under the curve. It is used in classification analysis in order to determine which of the used models predicts the classes best.

An example of its application are ROC curves. Here, the true positive rates are plotted against false positive rates. An example is below. The closer AUC for a model comes to 1, the better it is. So models with higher AUCs are preferred over those with lower AUCs.

Please note, there are also other methods than ROC curves but they are also related to the true positive and false positive rates, e. g. precision-recall, F1-Score or Lorenz curves.

$\begingroup$can you please explain the ROC curve in the context of a simple crossvalidation of the 0/1 outcome? I don't know understand very well how the curve is constructed in that case.$\endgroup$
– CuriousJan 9 '15 at 13:42

The answers in this forum are great and I come back here often for reference. However, one thing was always missing. From @Frank's answer, we see the interpretation of AUC as the probability that a positive sample will have a higher score than the negative sample. At the same time, the way to calculate it is to plot out the TPR and FPR as the threshold, $\tau$ is changed and calculate the area under that curve. But, why is this area under the curve the same as this probability? @Alexy showed through simulation that they're close, but can we derive this relationship mathematically? Let's assume the following:

$A$ is the distribution of scores the model produces for data points that are actually in the positive class.

$B$ is the distribution of scores the model produces for data points that are actually in the negative class (we want this to be to the left of $A$).

$\tau$ is the cutoff threshold. If a data point get's a score greater than this, it's predicted as belonging to the positive class. Otherwise, it's predicted to be in the negative class.

Note that the TPR (recall) is given by: $P(A>\tau)$ and the FPR (fallout) is given be: $P(B>\tau)$.

Now, we plot the TPR on the y-axis and FPR on the x-axis, draw the curve for various $\tau$ and calculate the area under this curve ($AUC$).

We get:

$$AUC = \int_0^1 TPR(x)dx = \int_0^1 P(A>\tau(x))dx$$
where $x$ is the FPR.
Now, one way to calculate this integral is to consider $x$ as belonging to a uniform distribution. In that case, it simply becomes the expectation of the $TPR$.

$$AUC = E_x[P(A>\tau(x))] \tag{1}$$
if we consider $x \sim U[0,1)$ .

Now, $x$ here was just the $FPR$

$$x=FPR = P(B>\tau(x))$$
Since we considered $x$ to be from a uniform distribution,

But we know from the inverse transform law that for any random variable $X$, if $F_X(Y) \sim U$ then $Y \sim X$. This follows since taking any random variable and applying its own CDF to it leads to the uniform.

$$F_X(X) = P(F_X(x)<X) =P(X<F_X^{-1}(X))=F_XF_X^{-1}(X)=X$$
and this only holds for uniform.

Using this fact in equation (2) gives us:
$$\tau(x) \sim B$$

Substituting this into equation (1) we get:

$$AUC=E_x(P(A>B))=P(A>B)$$

In other words, the area under the curve is the probability that a random positive sample will have a higher score than a random negative sample.