What does adjacent mean?

Given a particular node, a node that is adjacent to this node is another node that is connected to it by an edge. For example, node ‘0’ is connected to node ‘1’ and node ‘2’ by an edge. So we can say that node ‘1’ and node ‘2’ are adjacent to node ‘0’. Because this is an undirected graph, this relationship is symmetric – node ‘0’ is adjacent to node ‘1’ and node ‘2’.

We can apply this concept to our graph of roads. Here, we arbitrarily pick a starting node (the green one), and progressively highlight the next adjacent nodes in the direction away from the starting node. Each node is coloured differently by the smallest number of edges it takes to reach it from our starting node:

Red = 1 edge

Blue = 2 edges

Orange = 3 edges

Purple = 4 edges

Pink = 5 edges

‘Define adjacent’ – check.

What is an adjacency list represenation of a graph?

An adjacency list is a data structure that can be used to answer this question:

Given all the nodes in our graph, which nodes are they adjacent to?

We will simply create a list of vectors data structure in R. We will firstly create a list of unique node IDs. Then for each node ID in our list, we will create a vector of nodes adjacent to those node IDs. The adjacency list is shown in the table to the right in the below image:

‘Define adjaceny lists’ – check.

Queues!

This one is slightly more complex. The analogy that is often used is that of a bus stop. Here is that analogy, illustrated with an icon from the Noun Project and some old, old memes.

This, ladies and gentlemen, is a bus stop:

In order to get trolling first thing in the morning, Trollface shows up first. He joins the queue (i.e. is enqueued):

Good girl wants to go for an early walk, so Doge shows up second. She joins the queue from the back (i.e. is enqueued from the back):

Nicolas Cage meme shows up late to the party, so he joins the queue in last place (i.e. is enqueued from the back):

In what order are they going to get onto the bus? Assuming that everyone respects the unwritten rules of the queue, Trollface will leave the queue first (i.e. is dequeued first). Then Doge. Then Nic:

How is this relevant to graphs?

My claim is that, if we replace these old memes with graph nodes, the queue data structure and our adjacency list can be used to perform a breath-first search of our graph. This breadth-first search will show us whether a path exists from our source node to our destination node. We will finally get to this later on in this post (I promise!).

But first, let’s work on this ridiculous queue example to see how we can use queues in R.

rstackdeque

We will be using the lightning fast rstackdeque package instead of creating our queue data structure from scratch. The elements of the queue are individual environments, containing some data and a reference to the next environment (see here for more juicy details).

Breadth-first search…finally!

So what is this breadth-first search?

When performing breadth-first search, we start with a source node. We visit every child of the source node. Then for every child of the source node, we visit every one of their children. We continue this process of visiting nodes at the same level of the graph until some condition is met (for example, we have reached our destination node). That’s why it looks like nodes are being visited in concentric circles about the green node (i.e. source node), moving broader before moving deper into the graph in the initial gif:

a vector keeping track of which nodes we have already visited and from which nodes we came from when we visited them (bare with me here)

What is this vector of visited nodes?

The vector in the last point is used to avoid going in cycles as we visit nodes in our graph. For example, we start at node ‘0’. Node ‘0’ is adjacent to node ‘1’. Later on in our breadth-first search, there will come a time when we need to process the nodes adjacent to node ‘1’. When this happens, we want to avoid processing node ‘0’ again as we have already visited it. Otherwise, we will enter an infinite loop whereby we visit node ‘0’, which is adjacent to node ‘1’, which is adjacent to node ‘0’ and so on.

A little detail relevant to creating this vector of visited nodes – what is the predecessor of the source node?

In our graph, our source node is ‘0’. Our destination node is ‘4’. We can clearly see that there is a way to get from ‘0’ to ‘4’. But it won’t be this obvious in a larger graph.

Let’s create them:

source_node'0'destination_node'4'

Here is our vector of visited nodes. We will be keeping track of the nodes we came from when we visit each one of them.

# create flag to indicate whether we have found our destination node
found = FALSE
create new queue
enqueue source node
while found == FALSE or the queue has elements left to process:
dequeue node at front of queue
if dequeued node == destination node:
found = TRUE
else:
for every child of the dequeued node:
if it has not been visited yet:
enqueue child node at back of queue
mark child node as visited from dequeued node
end if
end for
end if
end while

The neat thing about this algorithm is that, if we find our destination node, we can recover the path the algorithm took to reach it from the source node. Let’s assume that we found our destination node ‘4’ and that our ‘visited_nodes’ vector looks like this:

To recover the path, we can do something like this. We start with the destination node, and work backwards:

current_node = destination_node
path = current_node
while visited_nodes[current_node] != source_node:
current_node = visited_nodes[current_node] # find predecessor of current_node
path = append(path, current_node) # append it to our path
end while
# once we have reached our source_node, we simply append it to our path
path = append(path, source_node)
# we have found our path in reverse order (from destination to source). we
# reverse it to find a path from source to destination
path = reverse(path)

The breadth-first search algorithm in R

Sidenote: learning algorithms

Algorithms can be difficult to digest. I would recommend writing a function containing the algorithm and then using the handy debug() function to step through it. All you have to do is this:

debug(<function_name>)

Then call the function as you would normally:

<function_name>(<function_args>)

And then, if you’re using RStudio, you can step through it line-by-line using the F10 key. You can print the values of the local variables of your function using the R console to see how they evolve.

Once you’re done with debugging, you need to tell R to stop debugging the function. This is done like this:

undebug(<function_name>)

Enough! Here is the algorithm

bfsfunction(adjacency_list,source_node,destination_node){require(rstackdeque)# some initial checks if(!source_node%in%names(adjacency_list)){print('source node not in this graph...')return()}if(!destination_node%in%names(adjacency_list)){print('destination node not in this graph...')return()}# initialise our 'found destination node' flagfoundFALSE# set up our visited nodes vectorvisited_nodesrep(NA_character_,length(adjacency_list))names(visited_nodes)names(adjacency_list)# initialise source node predecessor as itselfvisited_nodes[source_node]source_node# create our empty queue and enqueue source nodeqrpqueue()qinsert_back(q,source_node)while(!found|!empty(q)){# dequeue at front elementdequeued_nodepeek_front(q)qwithout_front(q)# have we found our destination node?if(dequeued_node==destination_node){foundTRUE}else{# otherwise, we have nodes to process. process each child# of the dequeued node...for(child_nodeinadjacency_list[[dequeued_node]]){# ...only if we have not visited it yetif(is.na(visited_nodes[child_node])){# enqueue child nodeqinsert_back(q,child_node)# mark the child node as visited from the dequeued nodevisited_nodes[child_node]dequeued_node}}}}# if we still have not found our path, it does not existif(!found){print('path not found')return()}# otherwise, recover the path from destination to sourcepathcharacter()current_nodedestination_nodepathappend(path,current_node)while(visited_nodes[[current_node]]!=source_node){current_nodevisited_nodes[[current_node]]pathappend(path,current_node)}pathappend(path,source_node)# and then reverse it!pathrev(path)return(path)}

Let’s test it out:

bfs(adjacency_list,source_node,destination_node)

## [1] "0" "1" "4"

Good…good..

As suspected, there is a path from our source node to our destination node. The path that the algorithm found in this case was 0 -> 1 -> 4.

My crappy animation

Here is my terrible attempt at animating the algorithm. Hopefully it helps someone.

Next time…we go back to tha streetz

This post was pretty dense. I’ve decided to stop here. I will cover how breadth-first search can be applied to OpenStreetMap data in the next post. We will also be covering an algorithm by this man: