I received quite a large number of replies to my request for information
about compiler-like static analysis tools. This article is a summary of:

1) my work/ideas and
2) responses I received.

Hope it's useful.

David Spuler

MY OWN WORK
===========

Here is a summary of my own bug lists and references on the use of static
analyzers for detecting program errors. Myself, I implemented a full C
checker (better than Lint) as my 4th year Honours project - hence there is an
Honours thesis. Naturally, the bug lists and references have a C bias.

I too am interested in static analysis tools. I have produced a C
compiler (intended to be commercial) that does extensive lint-like
checking.

Over a decade ago Fosdick and Osterweil (sp?) at Colorado produced a
program to detect "data-flow anomalies" in FORTRAN programs. I thought
that work was very interesting. At the time, I prototyped a similar
system (theirs was so slow that they recommended it be used only once in
the life of a program; mine was fast enough that it would not have
noticeably slowed a compiler that had the checks added).

Do you know of the PFORT Verifier? I think that it is available for free
from Bell Labs. I think that it is sort of a Lint for FORTRAN,
emphasizing checking for FORTRAN 77 standard conformance.

Again, a decade ago, there was some work at eliminating runtime range
checks from Pascal programs. Clearly, this could be turned around into
compile-time warnings, perhaps without being annoying. I think Welsh of
Queens University of Belfast wrote one of the better papers (dim
recollection).

Raymie Stata, a fellow graduate in my research group passed me your
request from the net.

I am completing a thesis on a bug detection scheme I have invented called
Aspect. I am attaching some blurb from a paper I have just written. I'm
including the bibliography too, which may give you some useful references.

If you're interested, I'd be happy to tell you more. There is one paper
published that describes the state of Aspect about 1 year ago; it's
`Aspect: an Economical Bug Detector', International Conf. On Software
Engineering, May 1991.

I'd also be very interested in any references or ideas that you have.

Regards,

--Daniel Jackson

About the Aspect Bug Detector:

Aspect is an annotation language for detecting bugs in imperative
programs. The programmer annotates a procedure with simple assertions
that relate abstract components (called `aspects') of the pre- and
post-states. A checker has been implemented that can determine
efficiently whether the code satisfies an assertion. If it does not,
there is a bug in the code (or the assertion is wrong) and an error
message is displayed. Although not all bugs can be detected, no
spurious bugs are reported.

...

The purpose of a compiler is not just to make it easier to write good
programs but also to make it harder to write bad ones. Catching errors
during compilation saves testing. It also spares the greater cost of
discovering the error later when it is harder to fix.

Programming errors can be divided into two classes. {\em Anomalies} are
flaws that are apparent even to someone who has no idea what the
program is supposed to do: uninitialized variables, dead code,
infinite loops, etc. {\em Bugs}, on the other hand, are faults only with
respect to some intent. An anomaly detector can at best determine
that a program does something right; a bug detector is needed to tell
whether it does the right thing.

Aspect detects bugs with a novel kind of dataflow annotation.
Annotating the code is extra work for the programmer, but
it is mitigated by two factors. First, some sort of redundancy is
inevitable if bugs, rather than just anomalies, are to be caught.
Moreover, Aspect assertions may be useful documentation: they are
generally much shorter and more abstract than the code they accompany.
Second, no mimimal annotation is demanded; the programmer can choose
to annotate more or less according to the complexity of the code or
the importance of checking it.

A procedure's annotation relates abstract components of objects called
`aspects'. The division of an object into aspects is a kind of data
abstraction; the aspects are not fixed by the object's representation
but are chosen by the programmer.

Each assertion of the annotation states that an aspect of the
post-state is obtained from some aspects of the pre-state. The
checker examines the code to see if such dependencies are plausible.
If there is no path in the code that could give the required
dependencies, there must be an error: the result aspect was computed
without adequate information. An error message is generated saying
which abstract dependency is missing.

...

Many compilers, of course, perform some kind of anomaly analysis, and
a variety of clever techniques have been invented (see, e.g.
\cite{carre}). Anomaly detection has the great advantage that it
comes free to the programmer. Aspect might enhance existing methods
with a more precise analysis that would catch more anomalies (using
annotations of the built-in procedures alone). But there will always
be a fundamental limitation: most errors are bugs and not anomalies.

The Cesar/Cecil system \cite{cesar}, like Aspect, uses annotations to
detect bugs. Its assertions are path expressions that constrain the
order of operations. Errors like failing to open a file before
reading it can be detected in this way. Flavor analysis \cite{flavor}
is a related technique whose assertions make claims about how an
object is used: that an integer is a sum in one place in the code and
a mean in another, for instance. Both techniques, however, report
spurious bugs in some cases: an error may be signalled for a path that
cannot occur. Aspect, on the other hand, is sound: if an error is
reported, there is a bug (or the assertion is wrong).

Type checking may also be viewed as bug detection when there is name
equality or data abstraction. Aspect is more powerful for two
reasons. First, since procedures with the same type signature usually
have different annotations, Aspect can often tell that the wrong
procedure has been called even when there is no type mismatch.
Second, type systems are usually immune to changes of state and so, in
particular, cannot catch errors of omission. Even models that
classify side-effects, such as FX \cite{FX}, do not constrain the
order of operations like Aspect.

The version of Aspect described here advances previous work
\cite{icse} by incorporating alias analysis. It can handle multi-level
pointers, whose precise analysis is known to be intractable
\cite{Landi}. The alias scheme adopted is most similar to
\cite{larus}, but it is less precise and cannot handle cyclic and
recursive structures.

I am a member of a quality control team in Citicorp Overseas Software Ltd.

I do a lot of desk work, to test code validity.

Till now I have used manual methods for static analysis of code, using
tables of states of variables, and basically sweating it out. I wish to
know if you have more information on known bugs or pitfalls in various
constructs of a language.

I will dig out some information on static analysers, and mail them to you

I'd be very interested in your list of references. Unfortunately I can't
find my own list of references to give you in return. Perhaps I didn't
type it in yet. I _can_ send you an unpublished paper (complete except
for bibliography) on detecting dataflow anomalies in procedural languages.
There is also an experimental program that implements the ideas of the
paper. The paper can be sent in postscript or dvi and the program in in
Turing -- a Pascal type language.

I have read your note about error detections in compilers. I have a great
interest in this particular field as my final project has to do with the
implemetation of expetional handlin in "Small C", a compiler designed on
the IBM 8088 and it's sole interest is educational, something equivalent
to minix. I would greatly appreciate if you could help me in finding
sources that dwell on this subject, anything that would be related to
errors and how one might deal with them would be relavant.
Many Thanks In Advance
--Amiran
ae2@cunixa.cc.columbia.edu
===================================================

>From paco@cs.rice.edu Sun Nov 24 04:42:03 1991

The Convex Application Compiler (TM?) apparently does a pretty good job.
Any interprocedural analyzer has to catch a lot of errors just to avoid
crashing. They also do pointer tracking, array section analysis, and
generally just all-out analysis and optimization. Bob Metzger
<metzger@convex.com> et al. have a paper on the pointer tracking in the
proceedings of Supercomputing '91, which was held last week.

It is alleged that Convex has sold machines, or at least copies of their
compiler, just for use in static error checking.

You might be interested in looking at abstract interpretation. It is a
technique related to data-flow analysis (you can express data-flow anlyses
as abstract interpreation problems) and is quite popular in among the
functional programming crowd.

There is a number of paper (even books!) on the subject, if you are
interested I can provide references.

I'm interested in hearing your list of bugs. I have given some thought to
the detection and propagation of error information in a program using
dependence information. Propagation of errors uses the idea that any
computation on a path which leads only to an error condition is dead
unless a side-effect intervenes, and any code after the error is
unreachable. Thus one can actually integrate program errors into control
flow information as an unstructured jump to the end of the program. In an
optimizing compiler, one might be tempted to say that any statement which
can be scheduled after the last side effect before the error is dead.
Thus, in this fragment: