Handling errors

Errors are something we have to face in the real world so everything
you write must have something to handle errors. Different languages
have different mechanisms for handling errors but the same concepts
will apply to them all.

Basic concepts

Coverage

You must make sure that you are able to respond to any error anywhere
in the program. It can be very tempting to take a short cut and leave the
error handling out of a short piece of code where "nothing can possibly
go wrong". Resist this temptation and make sure that you've made provision
for errors no matter where they may crop up and how unlikely they are.

Recursion

The error handler cannot handle errors inside itself. The first action
of an error handler routine should always be to turn off or
redirect the error handling. If this is overlooked then any error in the
error handler code will call the error handler and run into the same error
again, and again, and again until something in the internal structure of
your program overflows.

If you have to turn error handling off completely inside the handler
then any error in the error handler leads to a messy crash. This is not
good but at least it will crash immediately without doing any damage elsewhere.

Robustness

At the risk of stating the obvious, the error handler has got to work
first time, every time. Keep it as simple as possible. This means that
the handler must be completely self-contained. If it uses your bespoke
user interface object to display a message for the user then the handler
will lock up if the error happens to be in that very object. Use the
simplest means of output available - typically a Windows messagebox.

Levels of response

Not all errors are equally serious. Your error handler should have at
least four levels of response:

Silent

Log the error to disk but do not interrupt processing or say anything
to the user. You can use this level of response for diagnostic records
that will give you advance warning of future problems. Use it as a trace
mechanism to profile the typical usage of the system. If you are worried
about the speed of a particular operation then create a log entry at the
start and end of the process. This will tell you how long it really takes
for a typical user to process real data on an everyday workstation. It
will also tell you how often the situation occurs. All very useful
information when you are trying to decide which problem needs to be
fixed first.

Try again

Most of the errors in this category relate to the interface with the
outside world. They might be caused by a problem with the hardware or
by a mistake from the user. The cause might be a printer that's empty
of paper, a drive with no disk, or a user asking for the impossible.
The error handler should give the following information to the user:

What has happened

What are the implications

How they can recover

It should also give them the opportunity of trying again or of
abandoning the operation.

Avoid

These too are typically caused by the failure of something outside the
application but this time it's something over which the user has no
control. An example might be a problem with a communications link which
means that data cannot be imported from another site. The error handler
should tell the user what has happened and confirm that the rest of the
application is still running as normal.

Abandon

Finally there are the serious errors which mean that the application
cannot continue to run safely. There are two sub-divisions here:

problems with the installation such as the failure of a network drive

problems with the application itself such as a Divide By Zero error.

In either situation, the error handler should tell the user what has
happened and then shut the application down as safely as possible.

Messages to the user

The user rarely cares what has gone wrong and only needs two items of
information:

What damage has been done.

Can I do anything to repair it.

The user should never see a raw error message from the programming
language. Something like 'Invalid File Offset' will be utterly meaningless
to most users. If the user has to report the error to an internal Help Desk
then display a message with an error number - something like:

Error 42
Please phone the Help Desk on extension 1234.

The number might be the raw error code from the language or you may
want to draw up your own list of codes.

Error log

An error log is invaluable if you are supporting an application from off
site. The log should record time, date, user, and workstation so that you
know who had the problem and it should record the name of the program file
and the line number so that you know where the problem occurred. Rather
than having to listen to exaggerated tales of problems you can read the
log and see exactly what has been happening.

Consider having two types of error log; a simple log that merely records
the facts listed above and a verbose log that records far more detail such
as the amounts of free disk and memory space, the version and sub-version
of the operating system, and chain of programs and operator actions that
lead to the error. The verbose log will make the application run more
slowly and will generate a large log file but sometimes that will be the
only way of investigating a problem.

The error log can be recorded as a simple text file but if you store it in
a table then you will be able to analyse it and extract information such
as the most common types of error and the frequency of these errors.