sweating the small stuff

In an earlier post I described how I got started with Ruby
not by studying the language, but by reading then adapting some
existing code. Of course I was lucky in that the code I started from
was good. (At least I’m pretty sure it was: it came from a trusted
source, it had unit tests, looked clean — I think I can recognise
good code even without knowing the language it’s been written in.)
This approach of learning how to program by reading code is far from
radical but is perhaps better suited to some languages than others.

Learning to Program by Reading

Learning to program is like learning to write good natural
language. The best way to do it is to read some stuff written by masters of the form,
write some things yourself, read a lot more,
write a little more, read a lot more, write some more … and repeat
until your writing begins to develop the kind of strength and
economy you see in your models.

Talk to other programmers; read other programs. This is more
important than any book or training course.

We must also remember learning never stops
— meaning that we should always be reading good code.

Finding Good Code

Where, then, do we find good code to read? Maybe you’re lucky enough
to work with some excellent programmers — I guess many of us put in
more time reading code written by colleagues than by anyone else, since
that’s what we’re paid to do. Aside from that, you’re probably looking at
code you found somewhere on the internet.

Of course, the code will have to be open source (meaning, in this
case, that you have access to source code, not compiled binaries) and,
if you wish to adapt it, suitably licensed.

Dynamic Languages

One thing I like about the dynamic languages (Python, Ruby,
Perl, etc.) is their open nature. It may be possible to scramble a
Python program so it can’t be read but I don’t know how to
do it — and it’s certainly not part of the language tradition.

Another thing I like is the tradition of, and indeed support for, unit
testing in these languages. Some form of reflection makes unit testing
much easier. As does the ability to dynamically execute
code.
Unit tests actually make code easy to read: if you want to know how to
use a library, look at its unit tests. Python’s doctest presses this
point home by blurring the boundaries between code, tests and documentation.

So, if, for example, you want to learn how to program using Python,
the Python standard library is a great starting point. You’ll find it in your
Python installation. It’s the code you actually run when you use
Python, it’s of excellent quality, and of course there are
comprehensive unit tests.

Finally, dynamic languages are terse, so there’s less code to read.
Have a look, for example, at Peter Norvig’s Sudoku
solver — or even my
own!

Not So Dynamic Languages

To be fair, Java also has a fine tradition of
openness. It’s far from my favourite language but you don’t have to
look to hard to find superb Java source code published by the likes of
Sun and Apache.

You can also find good C code without trouble. C has been around long
enough that:

the language is stable, and

we know how to use it

C is often used as a portability layer for open source projects. Good
starting points to find good, readable C code would be
GNU, the Linux kernel, the C-Python
implementation.

Readable C++

Good C++ is rather harder to find — or at least C++ which is both
good and readable. Part of the reason for this is that there’s no
single way to write good C++. A C++ program which looked OK ten years
ago probably looks dated now (_”That’s not exception safe!”_,
“Why ever didn’t they use the STL?”,
“Surely we need a bit of template metaprograming here?”).
If the code hasn’t been actively maintained,
it probably doesn’t even compile: even though the standard is mature,
different implementations interpret it in different ways — and their
interpretations are subject to change.

You can probably examine much of your standard library implementation
— much of it is templated code delivered in header files — but some
of the platform specific ifs and buts may make it hard to read. This
stuff is heavily optimised, and, when optimisation and readability are
in opposition, as they often are, your standard library implementation
is likely to prefer the former.

Boost is packed
with superb, peer-reviewed, tested, open-source C++ code; but I
wouldn’t describe it as an easy read: certainly, it’s not for
beginners.

And Finally

I’m going to return to this subject. For now, I’ll close with a
favourite quotation, taken from the preface to the Wizard
Book.

Programs should be written for people to read, and only incidentally
for machines to execute.