Imagine that it’s 2 a.m. during finals week, and you’re scrambling to
finish a research paper on topica obscura. Your adrenaline
jumps when you find a relevant Wikipedia article, with links
to more Wikipedia articles! You start clicking away, jumping from page
to page in search of facts. An hour later, you realize you’re still
clicking, but these are pages you’ve already visited. No matter which
link you click, you can’t seem to discover any new pages!

If this has happened to you, you’ve unknowingly
(and unfortunately) stumbled upon a knot in Wikipedia. Knots are a
unique property of directed graphs. To understand them, it is necessary
to have a basic understanding of directed graphs.
The graphs in the rest of this book are undirected. In an
undirected graph, an edge represents a symmetric relationship between
vertices. This abstraction is appropriate for some applications,
such as acquaintance networks and transportation networks; but other
applications involve asymmetric relationships, for example, links
between pages in the World Wide Web.

Wikipedia links are also asymmetric. Page A might link to page B, but
page B doesn’t have to include any links to page A. To fully describe
the relationship between pages A and B, we need two bits of
information: whether A links to B, and whether B links to A . This is
the essence of a directed graph. In a directed graph, a directed
edge, e = (A,B), is an edge from A to B.

Knots are a unique (and sometimes cruel)
property of directed graphs. A knot is a
collection of vertices and edges with the property that every vertex
in the knot has outgoing edges, and all outgoing edges from vertices
in the knot terminate at other vertices in the knot. Thus it is
impossible to leave the knot while following any path along its
directed edges.

In the scenario above, we imagine finding a knot in Wikipedia;
that is, a set of articles with links to each other, but no
links to other articles.

We wondered whether such a thing was even possible. Given that there
are 3,811,000+ articles on Wikipedia and they all link to different
pages, it seemed unlikely that there would be a knot, but we decided
to investigate.

We started by extending Graph.py, from Section 2.2, to
support directed graphs. The result is DirectedGraph.py, which
you can download from thinkcomplex.com/DirectedGraph.py.

When we were designing DirectedGraph, we wanted to preserve the
order of growth of Vertex and Arc lookups as they were
in Graph. However, we needed to keep track of edges into a
vertex and edges from a vertex. It made sense to keep track of these
in separate dictionaries. Having two
dictionaries takes twice as much memory, but this is a
reasonable price to pay for maintaining the speed of lookups.

To detect knots in a directed graph, we developed an algorithm based
on a breadth-first-search. _bfsknots searches for all the
vertices that can be reached from a given starting vertex, and returns
these vertices as a set.

def _bfsknots(self, s):
# initialize the queue with the start vertex
queue = [s]
visited = set()
on_first_vertex = True
while queue:
# get the next vertex
v = queue.pop(0)
# skip it if it's already marked
if v in visited: continue
# if we're on the first vertex, we're not actually visting
if v != s or not on_first_vertex: visited.add(v)
on_first_vertex = False
for x in self.out_vertices(v):
#if its out vertices have been cached, update visited
if x in self._knot_cache.keys():
visited.update(self._knot_cache[x])
visited.add(x)
#otherwise add it to the queue
elif x not in self._knot_cache.keys():
queue.append(x)
return visited

We run _bfsknots for every vertex in
the graph, and build a dictionary that maps from a vertex to the
set of vertices it can reach. That is, for each vertex, V, we
know the set of reachable vertices, SV.

If there is a knot in the graph, then for every vertex, V, in the
knot the reachable set SV is exactly the set of vertices in the knot.

The function has_knot iterates through each vertex in a graph,
and returns True if this condition holds. If
it checks the whole graph and does not find a knot, it returns False.

def has_knot(self):
"""
Returns true if directed graph has a knot.
"""
self._knot_cache = {}
#build the cache of which vertices are accessible from which
for v in self:
self._knot_cache[v] = self._bfsknots(v)
#searches for knot
for v in self:
if self._knot_at_v(v):
return True
return False

Finally, _knot_at_v checks whether all vertices reachable
from V have the reachable set SV.

Exercise 1

Download thinkcomplex.com/DirectedGraph.py.
Read through the file to make yourself familiar with
the terminology. Note that edges in DirectedGraph are represented by
Arcs.

Determine the order of growth of has_knots experimentally. You
can follow the example in Section 3.5. Below are
some hints to help you out.

Use DirectedGraph.add_random_edges to generate graphs of
different sizes.

For each graph, time how long it takes to check if it has a knot.

On a log-log scale, plot run time versus
the number of vertices and run time versus the number of edges.

Exercise 2
Find all the knots in a directed graph.

Write a function, all_knots, that returns a list of all
knots in a graph, where each knot is a set of vertices.

Write a function named entry_points that returns a list
of all the vertices that serve as entry points into knots.

To find knots in Wikipedia, we selected 558 disjoint subsets of
Wikipedia articles, organized by index. For example, the index of
articles about neurobiology was used to define one subset, and the
index of articles about Zimbabwe was used to define another
subset. The 558 subsets contain about 321,000 articles, or
10% of Wikipedia.

Of these subsets we examined, 38% contained at least one knot. So if
you are restricted to articles listed on a single index page, there is
a substantial chance you will eventually find a knot.

For the complete Wikipedia, the probability is lower because
articles can link to articles in other indices. Based on
these results, we cannot say, yet, whether the complete Wikipedia
has a knot.

When you started your hypothetical research on topica obscura,
you might have used journal articles instead of Wikipedia articles.
Instead of links, you would have used citations to find other papers.

In that case, you would never find a knot. Why not?
Because science articles are
published sequentially, and a paper cannot cite a paper that will be
created in the future. So the directed graph that represents papers
and their citations cannot have a knot.