Innovative mining, processing, and application of big graphs

View/Open

Date

Author

Metadata

Abstract

With continued advances in science and technology, big graph (or network) data, such as World Wide Web, social networks, academic collaboration networks, transportation networks, telecommunication networks, biological networks, and electrical networks, have grown at an astonishing rate in terms of volume, variety, and velocity. Analyzing such big graph data has huge potential to reveal hidden insights and promote innovation in business, science, and engineering domains. However, there exist a number of challenging bottlenecks in developing advanced graph analytics tools in the Big Data era. This dissertation research focus on bridging graph mining and graph processing techniques to alleviate such bottlenecks in terms of both effectiveness and efficiency. This dissertation had made original contributions on exploring, understanding, and learning big graph data in graph mining, processing and application: First, we have developed a suite of novel graph mining algorithms to analyze real-world heterogeneous information networks. Our algorithmic approaches enable new ways to dive into the correlation structure of big graphs to derive new insights about how heterogeneous entities interact with one another and influence the effectiveness and efficiency of graph clustering, graph classification and graph ranking. Second, we have developed a scalable graph parallel processing framework by exploring parallel processing optimizations at both access tier and computation tier. We have designed a suite of hierarchically composable graph parallel abstractions to enable large-scale graphs to be processed efficiently for iterative graph computation applications. Our approach enables computer hardware resource aware graph partitioning such that parallel graph processing workloads can be well balanced in the presence of highly irregular graph structures and the mismatch of graph access and computation workloads. Third but not the least, we have developed innovative domain specific graph analytics frameworks to understand the hidden patterns in enterprise storage systems and to derive the interesting correlations among various enterprise web services. These novel graph algorithms and frameworks provide broader and deeper insights for better understanding of tradeoffs in enterprise system design and implementation.