Keywords

Rights

Creative Commons License

Abstract

The process of discovering interesting, useful and previously unknown knowledge from very large databases is known as data mining. Outlier or exception mining focuses on the problem of finding patterns that apply to a small percentage of data objects. Outliers are observations that show different characteristics from all other data objects to arouse suspicion that they were generated by a different mechanism. Density-based algorithms for mining outliers are the most effective in finding all forms of outliers. Density-based algorithms determine outliers based on the concentration of data objects at a location and declare objects with few neighbours as outliers. However, existing density-based algorithms have the following drawbacks: (1) computing the local reachability distance and density for every object before the few outliers are found; (2) computing local outlier factor (LOF) for every object in the dataset before declaring those with very high LOF as outliers. These are very expensive computations since outliers form only a small fraction of the entire data. This thesis proposes Local Sparsity Coefficient (LSC) and Enhanced Local Sparsity Coefficient (ELSC) algorithms based on the distance of an object and those of its k-nearest neighbours without computing reachability distance and density of every object. This reduces the number of computations and comparisons in LOF technique. In ELSC, data objects that can not possibly contain outliers are pruned (removed) based on their neighbourhood distances. The remaining set constitutes the candidate set on which outliers are determined resulting in an improved performance over LSC and LOF. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .A39. Source: Masters Abstracts International, Volume: 41-04, page: 1099. Adviser: C. I. Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2002.