Graph Mining

Our mission is to build the most scalable library for graph algorithms and
analysis and apply it to a multitude of Google products. We formalize data
mining and machine learning challenges as graph problems and perform
fundamental research in those fields leading to
publications in top venues. Our algorithms and systems are used in
a wide array of Google products such as Search, YouTube, AdWords, Play, Maps, and
Social.

Some of Our Projects

Large-Scale Balanced Partitioning

Balanced Partitioning splits a large graph into roughly equal parts while
minimizing cut size. The problem of “fairly” dividing a graph occurs in a
number of contexts, such as assigning work in a distributed processing
environment. Our techniques provided a 40% drop in multi-shard queries in the
Google Maps Driving Directions, saving a significant amount of CPU usage.

Learn more

Large-Scale Clustering

Our team specializes in clustering graphs at Google scale. We have efficient
implementations of many different algorithms including hierarchical
clustering, overlapping clustering, local clustering, and spectral
clustering.

Learn more

Large-Scale Connected Components

Connected Components is a fundamental subroutine in many graph algorithms. We
have state-of-the-art implementations in a variety of paradigms including
MapReduce, a distributed hash table, Pregel, and ASYMP. Our methods are
10-30x faster than the best previously studied algorithms, and easily scale
to graphs with trillions of edges.

Learn more

Large-Scale Link Modeling

Our similarity ranking and centrality metrics serve as good features for
understanding the characteristics of large graphs. They allow the development
of link models useful for both link prediction and anomalous link discovery.
Our tool Grale learns a similarity function that models the link
relationships present in data.

Learn more

Large-Scale Similarity Ranking

Our research in pairwise similarity ranking has produced a number of
innovative methods, which we have published at top conferences such as WWW,
ICML, and VLDB. We maintain a library of similarity algorithms including
distributed Personalized PageRank, Egonet similarity, Adamic Adar, and
others.

Public-private Graph Computation

Our award-winning research on novel models of graph computation addresses
important issues of privacy in graph mining. Specifically, we present
techniques to efficiently solve graph problems, including computing
clustering, centrality scores and shortest path distances for each node,
based on its personal view of the private data in the graph while preserving
the privacy of each user.

Learn more

Streaming and Dynamic Graph Algorithms

We perform innovative research analyzing massive dynamic graphs. We have
developed efficient algorithms for computing densest subgraph and triangle
counting which operate even when subject to high velocity streaming updates.

Learn more

ASYMP: Async Message Passing Graph Mining

ASYMP is graph mining framework based on asynchronous message passing. We
have highly scalable code for Connected Components and shortest-path to a
subset of nodes in this framework.

Large-Scale Centrality Ranking

Google’s most famous algorithm, PageRank, is a method for computing
importance scores for vertices of a directed graph. In addition to PageRank,
we have scalable implementations of several other centrality scores, such as
harmonic centrality.