Big Data, Big Security, Bigger Challenges and Opportunities

The construction and study of systems that can extract useful knowledge from the massively growing data have become extremely challenging. The conventional knowledge extraction tools, which handle computations on matrices/graphs, do not typically scale to extremely large data sizes. Major difficulties particularly arise when the data correlations are dense like in the case of large scale Internet security and malware analysis: the underlying matrix/graphs cannot be fit into a single machine or, effectively partitioned, parallelized, and communicated within multiple processing units (i.e., system’s bandwidth limitations.)

Our research focus is on the development of a novel data/platform aware framework for massive analysis and knowledge extraction applications of structured dense matrices. Since the major bottleneck in portioning and parallelization is the system’s bandwidth limitations, we create scalable transformations that automatically adapt to the specific structure of the data and the capabilities/constraints of the underlying available computing machinery ranging from a single device to multi-machine structures with energy/ delay/ bandwidth/ memory constraints. Our present results demonstrate more than two orders of magnitude improvement (compared with the best known systems) for computing on dense data sets with billions of nonzeros. We also discuss the ongoing grand challenges and research opportunities