Abstract

Outlier detection is a nontrivial but important task for many of the application areas. There exist several methods in literature to find outliers. However, there is no single method that outperforms in all cases. Thus, finding proper algorithm and the value of its relevant parameter is crucial. In addition, none of the methods are perfect and verification by the domain experts can confirm if the outliers detected are meaningful or not. Proper visual representation of the detected outliers may help experts to resolve anomalies. In this paper, we proposed a visual analytic system that finds proper algorithm and value of its relevant parameter for a specific dataset using training set. Later, the chosen method is applied to the test data to obtain the outlier ranking. After that, data points are visualized with parallel coordinate plot (PCP) where colors of the lines are obtained by using the outlier factor of the data points. PCP is one of the popular high-dimensional data visualization techniques where coordinates are parallel to each other and each data point is represented by a line. Using the visual, experts can provide feedback and update the result. Experiments with different datasets ensure the strength of our system.

Keywords

References

1.

Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, New York, NY, pp. 392–403 (1998)Google Scholar