Research Abstracts Online
January 2009 - March 2010

University of Minnesota Twin Cities
Institute of Technology
Department of Computer Science and Engineering

PI: Vipin Kumar, Fellow

High-performance Data Mining

The primary objective of this research is to develop novel, high-performance data-mining algorithms and tools for mining large-scale datasets that arise in a variety of applications. Some examples are: gigabyte datasets collected by earth-observing satellites that need to be processed to better understand global-scale changes in biosphere processes and patterns; data generated by scientific simulations that can be used to gain insight into the underlying physical processes; data obtained through monitoring traffic networks to detect illegal network activities; and large collections of text and hypertext analyzed to extract relevant information. The key technical challenges in mining these datasets include: high volume, dimensionality, and heterogeneity; spatio-temporal aspects of the data; possible skewed class distribution; distributed nature of the data; and the complexity in converting raw collected data into high-level features. High-performance data mining is essential to analyze a growing amount of data and to provide analysts with automated tools that facilitate some of the steps needed for hypothesis generation and evaluation.