< Back to Blog
October 28, 2020

Interactive Webgraph Statistics Notebook Released

We are pleased to announce the release of an interactive Jupyter notebook that is used to provide visualization of webgraph statistics, and a way to interact with the webgraph.
Alex Xue
Alex Xue
Alex is a Computer Science graduate from the University of Waterloo, Canada, and an emeritus member of the Common Crawl Foundation.

We are pleased to announce the release of an interactive Jupyter notebook that is used to provide:

  • Visualization of webgraph statistics
  • An interface for interacting with the webgraph

The visualization of the web graph statistics is done by leveraging the WebGraph framework, which provides means of gathering many interesting data points of a web graph, such as the frequency distribution of indegrees/outdegrees in the graph, or size distributions of the connected components. We then are able to use pandas and matplotlib to provide a visualization for the data provided by WebGraph. This effort was largely inspired by the Topology of the 2012 WDC Hyperlink Graph document. Further details of WebGraph tool installation/usage, and the data visualization may be found in the cc-notebooks repository.

The interface for interacting with the webgraph is done by using pyWebGraph, a front end that interfaces Jython with WebGraph. First, before using this interface we must re-build the string maps, in order to create a mapping between the node ID (a numerical value), to domain name (and vice versa). Once this is established we are able to simply load up the graph into pyWebGraph, and you will be able to traverse the graph interactively.

Further details of pyWebGraph installation/usage, and how to rebuild the string maps may be found in interactive webgraph README of the cc-notebooks repository.

The Jupyter notebook is available on Github in the same repository. More details about how to navigate the repository can be found in the notebook itself, as well as in the README.

We hope that users will be able to use these notebooks to gain more insight into the web graph in a numerical and practical sense.

We are grateful for WebGraph for providing extremely useful tools for processing the web graph itself, and Massimo Santini for developing pyWebGraph.

This release was authored by:
No items found.