Graph - Linked Implementation

A graph is yet another data structure that you can use to
store information. Unlike trees, which have a strict
hierarchical structure, graphs are more flexible.

Consequences of graphs being more flexible are that (unlike trees) they
can can have loops and parts may be disconnected.

Here is a simple graph that stores letters:

Graph Vocabulary

The letters are held in what are called the vertices of the
graph. Vertices can be connected to other vertices. A connection
between 2 vertices is called an edge.

This example graph is a directed graph. This just means that
each edge in the graph is unidirectional, i.e., it goes from
one vertex to another. For example, there is an edge from
D to B, but there is in no edge representing the reverse
relationship (from B to D).

Also, all the vertices aren't connected in this example graph. I.e.,
there are connections between A, B, D and E, but there is
no way to get to vertex C from any of those vertices.
Thus, A, B, D and E form their own component. A
second component is made up of C and F.

Uses

Why would we want to store information in a graph? Well,
some types of information are naturally represented in a graph.

For example, the above graph could be viewed as a map of what cities
are connected by train routes. Viewing it this way, each
vertex represents a particular city and each edge
represents whether there is a train route from one city to another. We
can imagine that the edges are unidirectional since trains are only
allowed to go in one direction on the tracks (a new government rule so
trains don't crash into each other any more).

Since the graph has different components, there are some cities
that are not connected by train routes. For example, there is no
way to get from city F to city D. However, just
because a city is in a certain component, doesn't mean we can get to it
from another city in the component. For example, we cannot get to city
D starting from city B.

Finally, the edges in our graph represent a very simple
relationship, i.e., one city is connected to another. However, just as
we can store information at vertices (e.g., the city name), we can also
store information at each edge. For example, we might want to store
the distance between cities at edges OR the time
the trip takes OR the cost of the ticket OR all of those
pieces of information.

To keep our example simple, however, we'll just store information
at the vertices, and not at edges.

Graph operations:

As mentioned, graphs are pretty generic data structures in that they
can be used to represent lots of things. Thus, exactly what operations
we'll want for a graph will depend on what we want to do with it.

Suppose we want the following operations:

AddVertex:

Adds a new vertex to the graph.

For example, suppose there is a new city we want to add to our map of
train routes. AddVertex(graph, G)
would give:

AddEdge:

Adds a new directed edge to the graph.

For example, adding the city was not enough, we also need to say
how the rail lines connect it to other cities. Thus, we might do
an AddEdge(graph, C, G),
giving:

(an edge from C to G).

IsReachable:

Reports whether we can get there from here.

For example, we might want to know whether we can get to city E
from city A:

IsReachable(graph, E, A)
would report a true value.

Again, we might want to know whether we can get to city D
from city E:

IsReachable(graph, D, E)
would report a false value.

There are many other operations we might want, but these will suffice
for our example.

Graph representation in C:

Since we are providing a new data structure with several operations,
we'll want to organize it into a module.

As usual, we'll use ADTs/CDTs to hide the implementation details of our
data structure. Here is how the 2 files that make up the module, i.e.,
the interface (graph.h) and the implementation file
(graph.c) will look:

Note: We'll get the types from graph.h in
graph.c since we always include the header for a module
in the implementation part of the module.

Again, the interface for the graph will need
to have a abstract type for the graph (for people to define graph
variables) and the type of an element (for functional prototypes)...

The implementation is hidden from the user and will hold the
types we need to implement the internals of the graph. What other
types are needed will depend on our particular implementation choices.

Now, filling in the types in the interface is easy.

We need the type of an element. To keep it simple, we can
just store a letter for each city. (You can imagine storing
full city names and lots of other things about each city.)

So, the type-of-an-element is just:

typedef char graphElementT;

The abstract-type-of-graph is always the same, a pointer to
the CDT:

typedef struct graphCDT *graphADT;

The other types we need, those for the internal representation of
the graph, will depend on what we actually want to do with the
graph...

Implementation of a graph:

Instead of making a haphazard decision about what implementation to use
for the graph, we should give it some thought.

The types of graphs we need (directed, with possibly more than
one disconnected component) and what we want to do with them
(the operations AddVertex, AddEdge and
IsReachable) will greatly affect how we should implement
the graph. In other words, if we make poor choices, it may be
difficult or impossible to represent some graphs and it may be
difficult or impossible to implement some operations.

The 2 things that our graph implementation will have to represent
are vertices and edges.

Array Implementation

The first choice we might consider is an array implementation.
It's easy to store each vertex in an array:

Likewise, the edges can be stored using something called an
adjacency matrix. Essentially, it represents which vertices
are adjacent, or rather, which pairs of vertices have an edge. For
example, the adjacency matrix for our original
graph would be:

Each position represents whether one vertex is connected to another
(value 1 = true) or not (value 0 =
false). Note that it encodes the direction of the edges. For
example, since there is a 1 in row A, column
B, there is an edge from A to B. Since there is
no edge in the reverse direction (from B to A), row
B, column A has a 0.

Of course, this adjacency matrix could be represented by a
2-dimensional array.

The drawback to this approach lies in that we want to add vertices.
Adding vertices would require either making the 2 arrays (vertex and
adjacency array) some large maximum size OR reallocating new arrays and
copying the contents from the old to the new.

Aside: As another drawback, graphs with few edges would have a
lot of zeroes in the adjacency matrix, thus wasting space.

As we've seen before, data structures that need to grow like this are
sometimes better implemented with linked representations.

Linked Implementation

Another way to implement a graph is to used a linked-list-like
representation.

First, we need to store the element (the information at each vertex),
so it's easy to put that in some kind of node...

a vertex node's data
-------
| A |
-------

We also need to represent the vertices that A is connected to
(i.e., the edges fromA). Since we don't know how many
edges that will be, we can use a linked list to store a list of edges,
as in:

For vertices, we need to store the element at that vertex. In
contrast, for an edge, we just need to store a reference to the vertex
it is connected to (we don't need all the information about that
vertex). Thus, vertex nodes and edge nodes are
different things.

In the above edge that goes to B for example, since we don't
want to replicate all the information stored about B in this
edge node, that reference to B should really be a
pointer to the vertex node with B, as in:

Are we done? No, unfortunately! Since the graph can have more
than one component (i.e., disconnected part) AND because we
are not guaranteed to be able to get to all other vertices from an
arbitrary vertex in a component (e.g., from B we can't get to
A), we need some other way to be able to access all vertices.

One solution is to join them into a linked list. Here's our
current version of the representation:

Effects of Operations on Implementation

We have already seen how the operations we want to perform affect our
implementation. For example, because we want to be able to add
vertices easily, we rejected the array implementation.

Another operation we want is to determine if one vertex is reachable
from another. That will entail going from vertex to vertex (along
edges), seeing if we eventually reach the desired vertex.

However, since our graphs can have loops, we need to make sure we don't
enter an infinite cycle. Also, since there may be more than one way to
get to a single vertex, we want to make sure we don't waste effort by
exploring a vertex more than once. To solve both problems, we just
need to make sure that we don't go to a vertex we've been to before.
The easiest way to keep track of this is to add a field to each vertex
that keeps track of whether it has been visited or not, as in:

Now that we've finished the types, let's implement one of the graph
operations. The graph functions we'll need are:

For general graph operations:

GraphAddVertex()

GraphAddEdge()

GraphIsReachable()

Because we are programming in C (setup/cleanup):

GraphCreate()

GraphDestroy()

We'll just implement GraphIsReachable().

IsReachable algorithm:

Let's suppose we want to determine if one vertex is reachable from
another...

Here are the steps for such an algorithm:

1. Set all the visited flags to false.
2. Go to the start vertex.
3. If vertex you are at has been visited,
don't pursue this route again. Go back
to where you came from, returning false.
4. If it is the destination we are looking for,
you've found it, so return true.
5. Set its visited flag to true.
6. Go through each of the edges.
Go to the vertex an edge connects to
(new starting location).
Go to step 3.
7. When exhausted all edges and not found it,
it is not reachable from here, go back to
where you came from, returning false.

Note that when we return false, it doesn't necessarily mean the
destination is not reachable from the source, just from the
subgraph we are working on.

Aside: Is this traversal of the graph depth-first or
breadth-first? Well, suppose B had edges... Would
we do the vertices B is connected to (its children) before doing
D (B's sibling under A)?

Answer: Yes! We'd do the children first, which makes
it depth-first!

Note that Step 1 and 2 are done once, but we may repeat Steps 3 through
7 over and over. Also, we need to go back to where we came
from if we've already visited a vertex or we exhaust its edges before
getting to the destination.

Since this part of the algorithm is repetitive and involves backtracking,
what technique can we use to write it?

Answer: Recursion!

IsReachable function:

The part of IsReachable we will perform recursively will be
done by a helper function, RecIsReachable(). This
function will be initially called with the start vertex and
will recurse on other vertices. The only type that
will allow us to refer to both the start and any other vertex is
vertexT *.

In other words, ADT/CDT is not a candidate since it only refers to one
vertex, and not necessarily the one we want to start from. Thus,
RecIsReachable() will have to take a vertexT
*.

Now, before we discuss the implementation of the helper function, let's
consider the function GraphIsReachable(), which users of a
graph will call. This function will need to take a graph, the
destination, and the source. Since the user does not know how the
graph is represented, it refers to the source and destination by the
elements located at the source and destination vertices,
giving a prototype of:

The int return value will contain a true or
false value depending on whether the destination is reachable
from the source.

Marking all the vertices as not visited requires traversing
all the vertices. Also, finding the vertex with the source
value may require searching all the vertices. So, it's easy
for GraphIsReachable() to do those things at the same time
(and before it calls the recursive helper function to do the searching).

Notice that this even works when the desired source isn't in the graph,
including the case when the graph is empty.

Now, the work of the repetitive steps can be done recursively,
e.g.:

static int RecIsReachable(graphElementT dest,
vertexT *startP)
{
edgeT *edgeP;
/* Have we been here already? */
if (startP->visited)
return 0;
/*
* Is this the destination? If so,
* we've reached it!
*/
if (startP->element == dest)
return 1;
/* Don't come here again. */
startP->visited = 1;
/*
* See if we can get there from each
* of the vertices we connect to.
* If we can get there from at least
* one of them, it is reachable.
*/
for (edgeP = startP->edges;
edgeP != NULL;
edgeP = edgeP->next) {
if (RecIsReachable(dest, edgeP->connectsTo))
return 1;
}
/*
* Couldn't get there from any of our
* neighbors, so it is unreachable from
* here.
*/
return 0;
}

Note that we could have passed the destination in via a vertex pointer
too, if like the start vertex, we had looked for it when looping
through each vertex in GraphIsReachable().