Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives that are correctly identified as such. Specificity measures the proportion of actual negatives that are correctly identified as such. In other words, sensitivity quantifies the avoiding of false negatives, and specificity does the same for false positives.

A perfect test would be described as 100 percent sensitive and 100 percent specific. In reality, however, any non-deterministic predictor will possess a minimum error bound known as the Bayes error rate. According to the Bayes’ Theorem, simply put, if you’re looking for a needle in a haystack, the best possible tests will still usually report false positives.

According to Wikipedia, “For any test, there is usually a trade-off between the measures – for instance, in airport security, since testing of passengers is for potential threats to safety, scanners may be set to trigger alarms on low-risk items like belt buckles and keys (low specificity) in order to increase the probability of identifying dangerous objects and minimize the risk of missing objects that do pose a threat (high sensitivity).”

Similarly, in application security testing, false positives alone don’t determine the full accuracy. False positives are just one of the four aspects that determine its accuracy – the other three being ‘true positives,’ ‘true negatives,’ and ‘false negatives.’

False Positives (FP): Tests with fake vulnerabilities that were incorrectly reported as vulnerable

True Positives (TP): Tests with real vulnerabilities that were correctly reported as vulnerable

False Negatives (FN): Tests with real vulnerabilities that were not correctly reported as vulnerable

True Negatives (TN): Tests with fake vulnerabilities that were correctly not reported as vulnerable

Therefore, true positive rate (TPR) is the rate at which real vulnerabilities were reported, correctly. False positive rate (FPR) is the rate at which fake vulnerabilities were reported as real, incorrectly.

When developers and QA engineers write their unit tests and functional tests, they write these tests specifically for their applications. If they’re testing a+b=c, then all their tests are written specifically to test that. It helps them to ensure the accuracy of these tests is high, i.e., no false positives and no false negatives. Developers or QA engineers will not perform their unit & functional tests using “generic” tests. In other words, they “curve-fit” their tests to specifically match their results. If they didn’t, they’d have some false positives and some false negatives that would fail their tests, incorrectly. Similarly, if they wrote their own security tests customized specifically for their application, then these would be highly accurate as well. But when it comes to automated application security testing solutions such as WhiteHat Sentinel, all the security tests are fairly “generic” and written to be able to scan any and all applications.

With that generality come both false positives and false negatives. False positives can be easily reduced by reducing the number of tests and lowering coverage. But that will also increase the false negatives, and as a result, the coverage of these tests. Therefore, it is almost impossible to remove both false positives and false negatives for these “generic” automated security tests. In other words, TPR can never be 100 percent, and FPR can never be 0 percent. Thus, the OWASP Benchmark Score can never be 100 percent.

So we always encourage you to consider all four when analyzing the accuracy of automated application security solutions. You may use the OWASP Benchmark Project, a vendor-neutral, well-respected, and true indicator of accuracy, for comparing the accuracy of other solutions.

We will be exploring additional details on our findings from running OWASP Benchmark in our next blog post. Stay tuned!

Cookie Use

We use cookies to store information on your computer that are either essential to make our site work or help us personalize and improve the user experience. By using this site, you consent to the placement of these cookies. To learn more, see our Cookie Policy.