Statistics > Methodology

Title:Local projections for high-dimensional outlier detection

Abstract: In this paper, we propose a novel approach for outlier detection, called
local projections, which is based on concepts of Local Outlier Factor (LOF)
(Breunig et al., 2000) and RobPCA (Hubert et al., 2005). By using aspects of
both methods, our algorithm is robust towards noise variables and is capable of
performing outlier detection in multi-group situations. We are further not
reliant on a specific underlying data distribution.
For each observation of a dataset, we identify a local group of dense nearby
observations, which we call a core, based on a modification of the k-nearest
neighbours algorithm. By projecting the dataset onto the space spanned by those
observations, two aspects are revealed. First, we can analyze the distance from
an observation to the center of the core within the projection space in order
to provide a measure of quality of description of the observation by the
projection. Second, we consider the distance of the observation to the
projection space in order to assess the suitability of the core for describing
the outlyingness of the observation. These novel interpretations lead to a
univariate measure of outlyingness based on aggregations over all local
projections, which outperforms LOF and RobPCA as well as other popular methods
like PCOut (Filzmoser et al., 2008) and subspace-based outlier detection
(Kriegel et al., 2009) in our simulation setups. Experiments in the context of
real-word applications employing datasets of various dimensionality demonstrate
the advantages of local projections.