Stochastic Outlier Selection

Description

An outlier is one or multiple observations that deviates quantitatively from the majority of the data set and may be the subject of further investigation.
Stochastic Outlier Selection (SOS) developed by Jeroen Janssens[1] is an unsupervised outlier-selection algorithm that takes as input a set of
vectors. The algorithm applies affinity-based outlier selection and outputs for each data point an outlier probability.
Intuitively, a data point is considered to be an outlier when the other data points have insufficient affinity with it.

Outlier detection has its application in a number of field, for example, log analysis, fraud detection, noise removal, novelty detection, quality control,
sensor monitoring, etc. If a sensor turns faulty, it is likely that it will output values that deviate markedly from the majority.

For more information, please consult the PhD Thesis of Jeroens Janssens on
Outlier Selection and One-Class Classification which introduces the algorithm.

Parameters

The stochastic outlier selection algorithm implementation can be controlled by the following parameters:

Parameters

Description

Perplexity

Perplexity can be interpreted as the k in k-nearest neighbor algorithms. The difference with SOS being a neighbor
is not a binary property, but a probabilistic one, and therefore it a real number. Must be between 1 and n-1,
where n is the number of points. A good starting point can be obtained by using the square root of the number of observations.
(Default value: 30)

ErrorTolerance

The accepted error tolerance to reduce computational time when approximating the affinity. It will
sacrifice accuracy in return for reduced computational time.
(Default value: 1e-20)

MaxIterations

The maximum number of iterations to approximate the affinity of the algorithm.
(Default value: 10)