Monday, June 29, 2009

In his essay "Go To Statement Consedered Harmful" Edsger W. Dijkstra
demonstrated how the use of goto made programming harder. Now,
gotois considered harmful, and has been replaced by more
reasonable constructs. This essay is an attempt to demonstrate that
assignment is harmful in the same way. But first, just to be clear:

A variable is a place in which a value is stored. Not to be
confused with the value itself.

An assignment is the replacement of the value currently stored
in a variable by another. Not to be confused with
initialisation, wich is the the placement of a value in a new
variable.

Assignment is not mandatory

Assignment has two purposes. The first is storing a value for later
use. Initializing a new variable can do that, and is less disruptive.
The second is the construction of loops. Recursive function calls can
do that, but many programmers find plain loops more readable.
Fortunately, even loops are rarely mandatory. The factorial is a case
in point (here in pseudo-code):

fac n = product [1..n]

This is just a definition. The natural way to implement it in an
imperative language is to use a loop (here in C):

int fac(int n)
{
int acc = 1;
while (n > 0)
acc *= n--;
return acc;
}

It hardly looks like the definition of a factorial. The product and
the sequence of integers are there, but interleaved, somewhat hidden.
There is a better way (here in Haskell):

fac n = product [1..n]

This is actual, runnable code. [1..n] denotes a list of integers,
ranging from 1 to n. product is a function (not a primitive), which
takes a list as argument and returns the product of its elements.

This was just an example. In real code, there is all sorts of loops.
However, they follow a few well known patterns, just like goto did.
These patterns have been captured in recent programming languages like
Haskell, just like the patterns of goto had been captured in
imperative languages.

Now, in a reasonable programming language, loops are hardly needed,
and assignment is not needed at all.

Assignment makes the term "variable" confusing

In the Ocaml program, the "ref" keyword and the "!" operator make
a clear distinction between a variable and its value. In C, such
disambiguation is made from context.

All popular imperative programming languages are like C in this
respect. This leads to many language abuses, like "x is equal to
1, until it is changed to be equal to y". Taking this sentence
literally is making three mistakes:

x is a variable. It can't be equal to a value (1). A glass
is not the drink it contains.

x and y are not equal, and will never be. They represent two
distinct places. They can hold the same value, though. That two
different glasses contain the same drink doesn't make them one
and the same.

x doesn't change. Ever. The value it holds is merely replaced
by another. A glass doesn't change when you replace its water by
wine.

The gap between language abuse and actual misconception is small. If
we have any misconception about variables, even temporary, how can we
hope to write correct programs?

Assignment makes program analysis harder

Compiler writers have understood that for quite some time. Now, a
typical compiler for an imperative language will first transform the
source code to SSA, an intermediate form where assignment is
basically banned. This makes optimization simpler.

This also apply to manual analysis. Imagine the everyday situation of
trying to understand the code of a colleague;

Maybe it prints 42. Maybe not, because x may have changed (whoops,
sorry, may not contain 42 any more). To be sure, we have to look
at that big blob of code. Forgetting that may introduce a bug.
Without assignment, the dependency chain is obvious, and can't be
ignored.

Assignment makes variable naming harder

When you know a variable will allways have the same value, you name it
after that value. If this value can change, you have to consider
all possible values. A name rarely scale that well. That makes
code harder to understand.

For instance, in my C implementation of the factorial, I
named the accumulator of the loop "acc". As a name for an accumulator,
this is accurate. However, the last value of acc was the factorial
of n. A good name to reflect that would have been "fac_n". Neither
name is satisfactory because each misses something important.

Assignment makes refactoring harder

There is a very important, often overlooked, rule about programming:
the more you allow, the more you prevent. For each thing you allow in
a program, you have to drop a set of assumptions about it. As a
result, some manipulations become unsafe or impossible.

In high school, a definition like "let a = x + 1" meant any occurence
of "a" or "(x + 1)", can be replaced by the other without changing the
meaning of what is written. They are equivalent, and therefore
substitutable. Imperative programs are more complicated:

int x = 42;
x = 1;
printf("%d", x); // try and replace x by 42!

Without assignment, "x" and "42" would be equivalent and
substitutable. Because they are not, refactoring is harder.

Assignment hurts performance

Optimization during compilation can be seen as a form of refactoring.
Harder refactoring means harder optimizations. It complicates the
compiler and make it generate less efficient code. SSA form mitigate
this problem, but don't eliminate it.

Another thing you lose when you allow assignment is sharing. It
becomes important when you manipulate relatively complicated data
structures, such as associative maps. There are three obvious ways to
manipulate a data structure:

Directly modifying it (assignment allows that).

Create a new structure by copying the whole thing.

Create a new structure by referencing the old one. (The unchanged
parts are shared).

Each way have a specific problem. Way 1 is effectively an assignment,
with all the disadvantages mentioned above. Way 2 wastes time and
memory. Way 3 is unsafe if you ever use way 1. If assignment is
allowed, way 2 is often your only safe choice. If it is not, way 3 is
safe and convenient and efficient.

These problems are pervasive

Using assignment sparringly is one thing. Compromises must be made,
for instance to achive the best possible performance on a critical
section of the code, or in high performance applications like device
drivers. Using assignment everywhere is another thing. Often a big
ball of mud.

A piece of advice

Now I made my point about how assignment is harmful, I would like you
to take action so it is less used. So please:

Learn a purely functionnal language —I suggest Haskell. It will
show you how you can do without assignment, and what are the
advantages of not having it. Beware, though: it may be addictive.

Push for better languages. At the very least, demand garbage
collection. Manual memory management is a big consumer of
assignments.

Avoid making functions which modify their arguments, or object
methods which modify the object. They're not easily composable
and often lead to verbose programs. In short, don't make
assignment happy APIs —often difficult without automatic memory
management.

4 comments:

Interesting! This reminds me of a talk I just saw at YAPC::NA last month by Yuval Kogman called "What Haskell did to my brain". He talked about using immutable objects in Perl. Instead of changing an object's attribute, you clone a new object with that attribute set to what you want.