Keywords

Data mining

fMRI imagesnetwork traffic

tensors

Project Summary

Given a large collection of fMRI images over time, how can one find patterns and correlations? Similarly, given a never-ending stream of network traffic information, how can one monitor for anomalies, intrusions, and potential failures? The main idea behind this proposal is to treat both problems using the theory of tensors. Despite the seemingly wide differences in the two settings, they both boil down to finding patterns in multidimensional arrays, sparse or dense. Tensors are exactly generalizations of matrices, and correspond roughly to ``DataCubes'' of data mining. Matrix analysis and decompositions are part of the standard toolbox for data mining, with SVD/PCA/LSI being the typical methods for dimensionality reduction, pattern discovery and ``hidden variable'' discovery. Extending these tools to higher dimensionalities is valuable and tensors provide the tools to do this generalization. However, these tools have not yet been put to use in large volume data mining. This is the main contribution of this proposal. The investigators propose (a) to design tensor decomposition algorithms that scale for large datasets, with special attention to sparse datasets, and to never-ending streams of data and (b) to apply them on two driving applications, fMRI data analysis and network data analysis.
The investigators propose to analyze large volumes of fMRI data performing the following sub-tasks: cluster voxels with similar behavior over time for a given subject and/or task or across subjects and/or tasks, classify patterns of brain activity, and detect lag correlations and spatio-temporal patterns among fMRI time sequences. The investigators also propose to perform the following inter-related tasks on multiple GigaBytes of network flow data: anomaly detection, pattern discovery, and compression.
Both of these applications are important for medicine, health management, and for computer and national security. Analysis of fMRI data can help understanding how the brain functions, which parts of the brain collaborate with what other parts, and whether there are variations across subjects and across task-related activities. For the network traffic monitoring setting, fast detection of anomalies is important, to spot malware, port-scanning attempts, and just plain non-malicious failures.
The educational goals include incorporating the research findings in advanced graduate courses at CMU (15-826) and at Temple (9664, 9665) and proposing tutorials in leading conferences in databases, data mining and bio-informatics audiences.