4. Node And Arc Diagrams

In designing dynamically-allocated data structures and the algorithms
that operate upon them, it is not a good idea to directly think about
code in a particular programming language. Much better is to use a
pictorial representation, which I call node-and-arc diagrams.

In node-and-arc diagrams, there are three types of things:

circular nodes, which represent dynamically allocated memory

arcs (or arrows), which represent pointers,

boxes, which represent statically allocated memory: these always
have a symbolic name beside them.

For example, if you have a program with the declarations:

float X ; /* X is a real */
float *P ; /* P is a Pointer to a real */

The compiler allocates (statically) space for these variables. In my
diagrams, this would produce two boxes, one for X and one for P.

The question marks indicate that these memory locations contain
undefined values. Of course, we can assign values to these variables
in the usual way:

X = 3.9; P = NULL;

We use the function GET_MEMORY to dynamically allocate memory; it
allocates a memory cell and returns the cell's address which we can
store in P, e.g. P = GET_MEMORY() ; In the pictures we could show the
result like this:

But this does not highlight the intimate relation between P and the
newly allocated memory. So instead we draw an arrow from P to the new
memory, and don't bother with the exact address at all.

The distinction between P and (*P) is now very clear. P, as always,
refers to the statically allocated box, *P to the memory the arrow
points to. I think of * as meaning ``follow the arrow''. The `name' of
the new memory is (*P); to set a value into the new memory we assign a
value to this name.

(*P) = 4.2;

If at this stage we do P = NULL, what is the result? Well, this means
exactly what it says: store the value NULL in the memory location P.

Can anyone see a problem with this? The problem is that the
dynamically allocated memory is `lost' or `dangling'. We have no way
to access it, we cannot even return it to the global pool. The reason
is that we have no name for it, no way to refer to it. *P was
the only name we had for it, and now that P has been changed, that
name is gone. In the diagram, a `lost' memory cell is very obvious -
it has nothing pointing to it. Of course P = NULL is not the only
thing that would have caused this problem... any change to P would
have done it.

The way we get rid of unwanted memory is with the procedure
RETURN_MEMORY. If we were in the previous state:

calling RETURN_MEMORY with P as a parameter would produce just what we
want:

Note that the value of P is now undefined. It is probably the case
that P still contains the address of the cell it used to point to,
e.g. 1908, but it would be an error to try and dereference this value.

Suppose now that we are in the state:

There is no harm in creating a second pointer to the
dynamically allocated memory. If we had declared
float *Q; we'd have:

And we can perfectly legitimately say Q = P. What does this do? Is any
new memory allocated? No! This copies the value that is
stored in P into Q.

Remember, the value stored in P is an address; a copy of that
address is placed in Q. In the diagrams we use arrows to show
addresses, so the result of Q = P is:

What we have now are two names for the same memory location. (*P) and
(*Q) refer to exactly the same memory cell. Therefore when we change
that cell using one of the names, e.g. (*P) = 2.5;

You can see that the value `in' (*Q) has also changed. We can test for
pointer equality in the usual way: P==Q asks if P and Q are identical,
is the same address stored in both boxes? In the diagrams this
questions means, are P and Q pointing at exactly the same place? The
answer in the above picture is yes. What about in this picture:

P==Q is false. P and Q do not point at the same memory cell. The cells
they point to happen to contain the same value, but that is not
what P==Q tests. Because they point at different places, if we change
one of these places, the other one is unaffected.

For example, if we say (*Q) = 3.7; we get:

How can we test if the memory cells they point to contain the same
value? Well, the name for the cell P points to is (*P) and the name of
the cell Q points to is (*Q), so we can just test (*P)==(*Q).

It is very important to always remember the difference between P,
the box, and (*P) the node that P points to. For example, in the
preceding picture what is the difference between P = Q and (*P) =
(*Q)?

Answer: P = Q produces:

Note that we have some `lost' memory now. By contrast, (*P) = (*Q)
produces: