We are pleased to announce the release of an interactive Jupyter notebook that is used to provide:
- Visualization of web graph statistics
- An interface for interacting with the webgraph
The visualization of the web graph statistics is done by leveraging the WebGraph framework, which provides means of gathering many interesting data points of a web graph, such as the frequency distribution of indegrees/outdegrees in the graph, or size distributions of the connected components. We then are able to use pandas and matplotlib to provide a visualization for the data provided by WebGraph. This effort was largely inspired by the Topology of the 2012 WDC Hyperlink Graph document. Further details of WebGraph tool installation/usage, and the data visualization may be found in the cc-notebooks repository.
The interface for interacting with the webgraph is done by using pyWebGraph, a front end that interfaces Jython with WebGraph. First, before using this interface we must re-build the string maps, in order to create a mapping between the node ID (a numerical value), to domain name (and vice versa). Once this is established we are able to simply load up the graph into pyWebGraph, and you will be able to traverse the graph interactively.
We hope that users will be able to use these notebooks to gain more insight into the web graph in a numerical and practical sense.