There are many systems in the natural world and in society that are amenable to mathematical and computational modeling. However, not everything is easily codified as a system of particles with coordinates and momenta. Some systems and problems such as social networks, ecologies, and genetic regulatory schemes are intrinsically divorced from spacetime descriptions, and instead are more naturally expressed as graphs that reflect their topological properties. At their simplest, graphs are simply collections of nodes – representing some class of objects like people, corporate boards, proteins, or destinations on the globe – and edges, which serve to represent connections like friendships, bridges, or molecular binding interactions.

Contents

What is a graph?

image

Consider the highway system of the eastern coast of the United States. A road inspector is given the task of writing reports about the current condition of each highway. What would be the most economical way for him to traverse all the cities? The problem can be modeled as a graph.

In fact, since graphs are dots and lines , they look like road maps. The dots are called vertices or nodes and the lines are called edges. They may have a value assigned to them (weighted) or they may just be a mere indicator of the existence of a path (unweighted). More formally, a graph can be defined as follows:

A graph GGG consists of a set of VVV of vertices (or nodes) and a set EEE of edges (or arcs) such that each edge e∈Ee \in Ee∈E is associated with a pair of vertices ∈V\in V∈V. A graph GGG with vertices VVV and edges EEE is written as G=(V,E)G=(V,E)G=(V,E).

Because graphs are so pervasive, it is useful to define different types of graphs. The following are the most common types of graphs:

Undirected graph: An undirected graph is one in which its edges have no orientations, i.e. no direction is associated with any edge. The edges (x,y)(x,y)(x,y) and (y,x)(y,x)(y,x) are equivalent.

Directed graph: A directed graph or digraph GGG consists of a set VVV of vertices (or nodes) and a set of edges (or arcs) such that each edge e∈Ee \in Ee∈E is associated with an ordered pair of vertices. If there is an edge (x,y)(x,y)(x,y), it is completely distinct from the edge (y,x)(y,x)(y,x).

Undirected graphs are typically represented by a line with no arrows, which implies a bidirectional relationship between node A and node B. Directed graphs use an arrow to show the relationship from A to B.

Directed acyclic graph: A directed acyclic graph (commonly abbreviated as DAG) is a directed graph with no directed cycles. A cycle is any path {A1,…,An}\{A_1, \ldots, A_n\}{A1​,…,An​} such that the edges A1→A2A_1\rightarrow A_2A1​→A2​, A2→A3A_2\rightarrow A_3A2​→A3​, …\ldots…, and An→A1A_n\rightarrow A_1An​→A1​ all exist, thus forming a loop. A DAG is a graph without a single cycle.

List all the edges and vertices of the undirected graph GGG in the figure above.

The graph GGG consists of the set of vertices VVV = {Massachusetts, Maine, Connecticut, New York, Maryland, New Jersey}.

Note that since the graph is undirected, the order of the tuples in denoting the edges is unimportant. □_\square□​

ABCD

Government surveillance agencies have a tendency to accumulate strange new powers during times of panic. The US National Security Agency (NSA) now has the ability to monitor the communications of suspected individuals as well as the communications of people within some number of hops of any suspect. In the communication network above, which person is connected to the greatest number of people through 1 hop or less?

Clarification: Options (A, B, C, and D) correspond to the people in order from left to right respectively.

Details and Assumptions:

Each dot represents a person.

Each line represents communication between the people on either end.

If X communicates with Y, and Y communicates with Z, we say that X and Z have a 1-hop connection, and that X has a 0-hop connection with Y.

Representation of Graphs

Above we represented a graph by drawing it. To represent it in a computer, however, we need more formal ways of representing graphs. Here we discuss the two most common ways of representing a graph: the adjacency matrix and the adjacency list.

The adjacency matrix

image

Represent the graph above using an adjacency matrix.

To obtain the adjacency matrix of the graph, we first label the rows and columns with the corresponding ordered vertices. If there exists an edge between two vertices iii and jjj, then their corresponding cell in the matrix will be assigned 111. If there does not exist an edge, then the cell will be assigned the value 000. The adjacency matrix for the graph above is thus

An adjacency list representation of a graph is a way of associating each vertex (or node) in the graph with its respective list of neighboring vertices. A common way to do this is to create a Hash table. This table will contain each vertice as a key and the list of adjacent vertices of that vertices as a value.

For our example above, the adjacency list representation will look as follows:

adlist

We can see that the adjacency list is much less expensive on memory as the adjacency matrix is very sparse.

Most graph algorithms involve visiting each vertex in VVV, starting from a root node v0v_0v0​. There are several ways of achieving this. The two most common traversal algorithms are breadth-first search and depth-first search.

Breadth-first Search

In a breadth-first search, we start with the start node, followed by its adjacent nodes, then all nodes that can be reached by a path from the start node containing two edges, three edges, and so on. Formally the BFS algorithm visits all vertices in a graph GGG, that are kkk edges away from the source vertex sss before visiting any vertex k+1k+1k+1 edges away. This is done until no more vertices are reachable from sss. The image below demonstrates exactly how this traversal proceeds:

image

For a graph G=(V,E)G = (V,E)G=(V,E) and a source vertex vvv, breadth-first search traverses the edges of GGG to find all reachable vertices from vvv. It also computes the shortest distance to any reachable vertex. Any path between two points in a breadth-first search tree corresponds to the shortest path from the root vvv to any other node sss.

We may think of three types of vertices in BFS as tree verties, those that have been taken of the data structure. fringe vertices, those adjacent to tree vertices but not yet visited, and undiscovered vertices, those that we have not encountered yet. If each visited vertex is connected to the edge that caused it to be added to the data structure, then these edges form a tree.

To search a connected component of a graph systematically, we begin with one vertex on the fringe, all others unseen, and perform the following step until all vertices have been visited: "move one vertex xxx from the fringe to the tree, and put any unseen vertices adjacent to xxx on the fringe." Graph traversal methods differ in how it is decided which vertex should be moved from the fringe to the tree. For breadth-first search we want to choose the vertex from the fringe that was least recently encountered; this corresponds to using a queue to hold vertices on the fringe.

What is the state of the queue at each iteration of BFS, if it is called from node 'a'?

PP

The table below shows the contents of the queue as the procedure. BFS visits vertices in the graph above. BFS will visit the same vertices as DFS. In this example all of them.

Depth-first Search

Depth-ﬁrst search explores edges out of the most recently discovered vertex sss that still has unexplored edges leaving it. Once all of ’s edges have been explored, the search “backtracks” to explore edges leaving the vertex from which was discovered. This process continues until we have discovered all the vertices that are reachable from the original source vertex. If any undiscovered vertices remain, then depth-ﬁrst search selects one of them as a new source, and it repeats the search from that source. The algorithm repeats this entire process until it has discovered every vertex:

Visit a vertex sss.

Mark sss as visited.

Recursively visit each unvisited vertex attached to sss.

dfs

A recursive implementation of DFS:

1
2
3
4
5

procedure DFS(G,v):
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w)

A non-recursive implementation of DFS, it delays whether a vertex has been discovered until the vertex has been popped from the stack.

1
2
3
4
5
6
7
8
9

procedure DFS-iterative(G,v):
let S be a stack
S.push(v)
while S is not empty
v = S.pop()
if v is not labeled as discovered:
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
S.push(w)

Contrasting Traversals

Similar to tree traversal, the code for breadth-first search is slightly different from depth-first search. The most commonly mentioned difference is that BFS uses a queue to store alternative choices instead of a stack. This small change however leads to a classical graph traversal algorithm. Depth-first search goes down a single path until the path leads to the goal or we reach a limit. When a path is completely explored we back track. BFS however explores all paths from the starting location at the same time.

As we increase the size of our graph, the contrast between depth-first and breadth-first search is quite evident. Depth-first search explores the graph by looking for new vertices far away from the start point, taking closer vertices only when dead ends are encountered; breadth-first search completely covers the area close to the starting point, moving farther away only when everything close has been looked at. Again, the order in which the nodes are visited depends largely upon the effects of this ordering on the order in which vertices appear on the adjacency lists.

Additional Problems

John lives in the Trees of Ten Houses, and it is a most ideal and idyllic place for him and the other dwellers up in the canopy. They have invested a tremendous amount of time in engineering these houses, and to ensure no house felt isolated from the others, they built a fresh, finely crafted bridge between each and every house!

Unfortunately, the Trees of Ten Houses were not immune to thunderstorms, nor were the bridges well engineered. The night was treacherous, howling with wind and freezing with rain, so the odds for the bridges were not good--each bridge seemed just as likely to survive as to be shattered!

Fortunately, as there were so very many bridges in the Trees of Ten Houses, when John did wake the following morning, he found he was able to make his way to each and every house using only the existing bridges, though round-about routes may have been necessary. As they began rebuilding, John became curious... what were the chances that they'd all be so lucky?

More formally, if PPP is the probability that, after the storm, John is able to traverse to each and every house, what is ⌊1010P⌋?\big\lfloor 10^{10} P \big\rfloor?⌊1010P⌋?

Details and Assumptions:

The Trees of Ten Houses do, in fact, contain precisely 10 houses.

Before the storm, there exists a single bridge between each and every unique pair of houses.

The storm destroys each bridge with independent probability 12\frac{1}{2}21​.

John is allowed to traverse through others' houses to try to reach all of them, but he must only use the surviving bridges to get there. No vine swinging allowed.

Tagged under #ComputerScience as this problem is quite tedious to do without it, though not impossible.