Abstract

Anomaly detection is the process by which low probability events are automatically found against a background of normal activity. By definition there must be many more normal events than anomalous ones. This rare nature of anomalies causes numerical problems for probabilistic methods designed to automatically detect them. This report describes an algorithm that introduces new discretisation levels to support the representation of low probability values in the context of Bayesian network anomaly detection. It is an engineering solution to a problem with an extant discretisation tool that represents a data set’s fine structure but fails to capture extreme values or nulls between modes in its probability density. It is demonstrated that the limitations of the extant tool can be overcome using examples of integer and continuous data.

Executive Summary

Many algorithms exist that take data for a continuous variable, such as latitude, and represent it using a finite number of states. Such an approach is common where the data are to be processed using a Bayesian network, for example, and is known as discretisation. A consequence of discretisation is that regions of the state space that are unlikely to be observed can be very coarsely approximated by conventional algorithms. This is a problem for anomaly detection, where detections stem from low probability data. Because outlying or low probability values are mapped to the same states as more highly probable values, anomalies may go undetected.

This report discusses an algorithm that generates a set of states that ensure that low probability data values can be represented. It does this using the states generated by an external algorithm as a first approximation, along with a summary of the data set of interest. This can be in the form of its unique values, or a histogram for continuous variables, and an estimate of the range of expected values. Detection of low probability regions is catered for in two ways: through a list of expected intervals that permits a quick screening of individual variables, and through new states that need to be realised within the Bayesian network.

The approach is demonstrated through examples of whole number data, a realisation of a Gaussian random variable, and real latitude data.