The importance of generalizability for anomaly detection

Abstract

In security-related areas there is concern over novel “zero-day” attacks that penetrate system defenses and wreak havoc. The best methods for countering these threats are recognizing “nonself” as in an Artificial Immune System or recognizing “self” through clustering. For either case, the concern remains that something that appears similar to self could be missed. Given this situation, one could incorrectly assume that a preference for a tighter fit to self over generalizability is important for false positive reduction in this type of learning problem. This article confirms that in anomaly detection as in other forms of classification a tight fit, although important, does not supersede model generality. This is shown using three systems each with a different geometric bias in the decision space. The first two use spherical and ellipsoid clusters with a k-means algorithm modified to work on the one-class/blind classification problem. The third is based on wrapping the self points with a multidimensional convex hull (polytope) algorithm capable of learning disjunctive concepts via a thresholding constant. All three of these algorithms are tested using the Voting dataset from the UCI Machine Learning Repository, the MIT Lincoln Labs intrusion detection dataset, and the lossy-compressed steganalysis domain.

Keywords

Clustering Anomaly detection Convex polytope Ellipsoid

Gilbert “Bert” Peterson is an Assistant Professor of Computer Engineering at the Air Force Institute of Technology. Dr. Peterson received a BS degree in Architecture, and an M.S. and Ph.D. in Computer Science at the University of Texas at Arlington. He teaches and conducts research in digital forensics and artificial intelligence.

Brent McBride is a Communications and Information Systems officer in the United States Air Force. He received a B.S. in Computer Science from Brigham Young University and an M.S. in Computer Science from the Air Force Institute of Technology. He currently serves as Senior Software Engineer at the Air Force Wargaming Institute.

Pelleg D, Moore A (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th international conference on machine learning (ICML), pp 727–734Google Scholar