These notes are intended to supplement or correct material in the texts.
They assume familiarity with the readings and are intentionally brief.
Stansifer was published in 1995, so presumably written in 1994, and hence is a
bit out of date. However, it takes a broad view of the role of programming
languages and their study, focussing on principles behind design, and
including historical and cultural information, as well as some underlying
mathematics, all of which I feel every well educated computer scientist should
understand to some extent.

First, a general remark about the class: because we need to talk about
programming languages in general, rather than just one particular language, we
will need to develop some rather sophisticated notation that can describe the
syntax and the semantics of programming (and other) languages.

1.1 Much recent research suggests that natural language is not
formal and cannot be formalized, due to its dependence on enormous amounts of
background information and social context, each of which is highly variable,
as well as difficult or impossible to formalize.

Findings from 1998 at the first pyramid show that Egyptian writing is at
least as old as Summerian, and moreover, used a phonetic alphabet. Hence
Stansifer's remarks on the evolution of the alphabet are out of date. The
shards found there were records of tributes to the pharoah.

Recent archaelogical research also shows that forms of
representation and calculation arose about 5,000 years ago, in
ancient Egypt and Summeria, which needed ways to represent and calculate with
numbers, largely for what we would now call accounting data. We would now
also speak of data types and algorithms, rather than
representation and calculation.

1.2 Work on the sociology of science shows that all mathematics has
its origin in practical work; moreover, even today, mathematics research is
largely driven by practical considerations; the notion of "pure mathematics"
is somewhat of a myth, though a powerful one.

Perhaps Gottfried Leibniz the first to dream of a language for computation,
though his dream went well beyond the programming languages that we have today
since he envisioned using them for all forms of human reasoning, including
legal reasoning. Leibniz also invented binary numbers, inspired by patterns
found in the ancient Chinese book of divination, the I Ching.
Leibniz is also famour for having invented the calculus, about the same time
as, and independently from, Isaac Newton.

The novel The Difference Engine by William Gibson and Bruce Sterling
contains an amusing fictional account of Babbage and Ada Augusta Lovelace,
featuring giant steam driven computers (and much more).

1.4 Both Turing and von Neumann used assertions and invariants
(which are the two main ideas) to prove correctness of flow chart programs,
which they each introduced separately in the early 1940s. Turing also
designed the first stored program electronic computer, although it was later
redesigned and built by others in the UK. This information was secret until
rather recently, because Turing's work was an important part of allied World
War II efforts to break German and Japanese secret codes for messages. (There
are several good books, and even a good play on Alan Turing's very interesting
but rather sad life; the author of one of those books, Andrew Hodges,
maintains a website devoted to the
life of Turing, which includes a page on the play.)

1.5 The most important thing about FORTRAN was its excellent
optimizating compiler; without this, assembly language programmers would never
have changed their habits. ALGOL was designed by a (largely European)
committee, and was the first langauge to have an international standard; it
also introduced many important new features. PL/I failed mostly because it
was too large. C was designed for systems programming, and cleverly combined
high and low level features, based on ideas from BCPL, which was based on
Christopher Strachey's very innovative CPL language. Simula, Smalltalk, and
of course C++ all grew out of the imperative tradition, and the object
oriented paradigm can be considered a variant of the imperative paradigm, as
is rather clear from the quotations from Alan Kay given in Stansifer. Simula
was developed by Kirstin Nygaard and Johan-Ole Dahl at the Norwegian Computing
Center, and was originally designed for applications to simulation.

It is interesting to observe that the three major programming paradigms
grew out of three major approaches to computable functions. Imperative (and
object oriented) languages grew out of Turing machines, which also inspired
the von Neumann architecture on which they usually run. The functional
programming paradigm grew out of the lambda calculus approach to computable
functions due to Alonzo Church, which was soon proved equivalent to the Turing
machine approach. LISP was the first functional language, and it directly
includes lambda abstraction and higher order functions. LISP was also the
first interactive language, as well as the first to have garbage collection
and to support symbolic computation; the latter was important for its intended
applications to artificial intelligence, as was the fact that its programs
were also written in its S-expression data structure. More recent functional
programming languages are ML and Haskell; the latter takes its name from
Haskell Curry, who introduced a variant of the lambda calculus which he called
the combinatory calculus.

The so-called logic programming paradigm is related to a different notion
of computability, having to do with the manipulation of algebraic terms,
arising from work of Jacques Herbrand. (I say "so-called" because I think it
is a misleading name, since its syntax is based on Horn clauses rather than
full first order logic, and in any case, there are many many other logics than
first order and Horn clause logic.) ML and Prolog both originated at
Edinburgh from research on theorem proving and logic, in groups led by Robin
Milner and Robert Kowalski, respectively, though these languages were much
more fully developed elsewhere.

Advocates of so called structured programming railed against the
infamous GOTO statement, claiming that it produced spaghetti code.
Many today feel that unrestricted inheritance in object oriented programming
is just as bad, especially in large systems that use dynamic binding and
multiple inheritance.

The kind of modules found in Modula-3, Ada, C++ and ML have their origin in
theoretical work by Goguen on abstract data types and general systems, and in
the language designs for Clear (with Burstall) and OBJ (or so he says).

The most important ideas for personal computing came from Doug Englebart at
SRI (then Stanford Research Institute): the mouse, windows, menus, and
networking (recall that interaction came from LISP). Alan Kay at Xerox PARC
popularized these ideas in Smalltalk, also adding icons; others at PARC
developed the ethernet and the laser printer. Apple added very little to this
mixture, and Microsoft has added nothing of intellectual significance
(companies spend a lot on advertising their alleged creativity, often at the
expense of the scientists who actually did the work).

The "Griswold" mentioned in connection with SNOBOL is the father of our own
Prof. Bill Griswold.

An overview of the history of programming languages reveals a progressive
increase in the level of abstraction: machine language; assembly
language; (so called) "high level" languages like FORTRAN; and then ever more
powerful features to support abstraction, including blocks, procedures, types,
recursion, classes with inheritance, modules, specification, ... In general,
this correlates with improvements in the underlying hardware. Perhaps machine
language makes sense if your hardware is (relatively) small systems of gears
and shafts, and your programs only compute repetative tables, as was the case
for Babbage and Lovelace; assembly language perhaps makes sense if your
hardware consists of (relatively) few vacuum tubes and your programs are for
(relatively) small numerical tasks; but powerful mainframes and PCs running
large programs require much more abstraction.

Perhaps the most glaring omission in Stansifer's book is Java, which did
not come out until after this book was written. Because it is a language
intended to be used over the internet, it has a very different character from
the other languages discussed above; in particular, Java applets have the form
of byte code, downloaded from a server to a client machine and run on there on
a Java abstract machine; security is therefore an extremely important issue.
Here, the hardware is the internet, not just a CPU with associated memory and
peripherals.

To summarize a bit, we can see that there is a really close relationship
between mathematics and programming. In particular, mathematicians were first
inspired to build computers by the need to solve mathematical problems, and
the architectures that they chose grew out of the mathematics that they knew.
Moreover, the major programming paradigms correspond to different ways of
defining the notion of computable function, and the historical trend of rising
levels of abstraction also follows trends found in mathematics.

2.1 Stansifer's ideas on natural language seem to
have come mainly from formalists like Chomsky, rather than from linguists who
study what real people actually write and say. For example, it is easy to
write a short play about painters redoing a trading room in a bank, where
desks are named "one desk", "two desk", "FX desk", etc., and where one of the
painters has the line, "Painted two desk" in response to his boss's asking
what he did. A number of disgruntled empirical linguists have written little
poems that end with the line "colorless green ideas sleep furiously", meaning
something like "Chomsky's uninteresting untried theories do nothing much after
a lot of effort". (It is an interesting exercise to try this yourself.)
Similarly, it easy to imagine a Star Trek episode in which some creatures
called "time flies" have affection for a certain arrow. We may conclude that
almost anything can be made meaningful, given the right context.

The important point here is that in natural language, context determines
whether or not something makes sense; formal syntax and semantics are very far
from adequate, and indeed, the distinction among syntax, semantics and
pragmatics does not hold up under close examination. On the other hand, the
formal linguists' way of looking at syntax and semantics works rather well for
programming languages, because we can define things to work that way, and
because traditionally, programming language designers want to achieve as much
independence from context as we can (though this might change in the future).

2.1.1 These principles are really important; please think about
them, and the examples that are given. Also, notice that the situation for
natural language is very different.

2.2 If we denote the empty set by "{}" and the empty
string by the empty character, then there will not be any way to tell the
difference between the empty set and the set containing the empty string. So
this is a bad idea. Instead, we can use the greek letter epsilon for the
empty string, and the Danish letter "O-with-slash" for the empty set, as is
usual in mathematics. Sometimes I like to write "[]" for the
empty string, while Stansifer sometimes writes "" for it, and
some other people use the Greek letter lambda! I am afraid that you will have
to get used to there being a lot of notational variantion in mathematics, just
as there is a lot of notational variation for programming languages; in fact,
I am afraid that notational variation will be an ongoing part of the life of
every computer science professional for the foreseeable future, so that
developing a tolerance for it should be part of your professional training.
But to help out a bit, I will write the epsilon for the empty string and for
set membership in different ways; the set epsilon will be much bigger, while
the string epsilon will be (insofar as I can do it) slanted a bit.

Notice that the definition of "language" in section 2.2.1 is not suitable
for natural language, where context determines whether something is acceptable
or not, and where even then, there may be degrees of acceptability, and these
may vary among different subcommunities, and even individuals, who speak a
given langauge; morever, all of this changes (usually slowly) over time. For
example, "yeah" is listed as a word in some dictionaries but not in others;
perhaps it will gradually become more and more accepted, as have many other
words over the course of time. At the level of syntax, English as spoken in
black Harlem (a neighborhood of New York City) differs in some significant
ways from "standard English" (if there is such a thing), in its lexicon, its
syntax, and its pronunciation; this dialect has been studied under the name
"Black English" by William Labov and others, who have shown that in some ways
it is more coherent than so called "standard English".

It may be interesting to know that the first (known) grammar was given by
Panini for Sanskrit, an sacred language of ancient India, more than 2,500
years ago. This work included very sophisticated components for phonetics,
lexicon, and syntax, and was in many ways similar to modern context free
grammars. The motivation for this work was to preserve the ancient sacred
texts of Hinduism.

If we let R be the set of regular expressions over a finite alphabet A,
then the "meaning" or denotation for R is given by a function

[[_]] : R -> 2 ** A* ,

where ** indicates exponentiation (sorry about that - HTML is
lousy for formulae), so that 2 ** X indicates the set of all
subsets of X, and A* is the set of all finite
strings over A. Notice that the semantics given by Stansifer for
regular expressions in compositional, in the sense that the denotation
of each part is computed from the denotations of its part.

Although there are clever ways that can give a compositional semantics the
property that the meaning of a part depends on the context within which it
occurs, in the sense of the other parts of which it is a part, a mathematical
semantics for a programming language is not going to give us what we usually
intuitively think of as the meaning of programs written in it. For example,
the fact that a certain FORTRAN program computes a certain formula does not
tell us that it gives the estimated yield of a certain variety of corn under
various weather conditions, let alone that this figure is only accurate under
certain assumptions about the soil, and even then it is only accurate to
within about 5 percent. Yet all this is an important part of the meaning of
the program.

2.3 Stansifer's exposition of attribute grammars can seem pretty
difficult to understand. However, the special case of grammars with just
synthesized attributes is much easier; it can be explained without too much
notation, and it can also be illustrated rather simply. For instance, here is
a context free grammar for a simple class of expressions:

S -> E
E -> 0
E -> 1
E -> (E + E)
E -> (E * E)

where N = {S, E} and T = {0, 1, (, ), +, *}. We can
find the value of any such expression by using an attribute grammar with just
one (synthesized) attribute, val, by associating the following
equations with the above rules:

It is now a good exercise to compute the value of the binary number
1010 by first writing its parse tree and then computing how the
values of the two (synthesized) attributes perculate up the tree.

We will see later than the parse trees of a context grammar form a very
nice little algebra, such that the values of synthesized attributes are given
by a (unique) homomorphism into another algebra of all possible values.

There is a slight inconsistency in Stansifer about whether In(S) should be
empty; my preference is that it need not be, because S can occur on the right
side of rules, as in S -> ASA on page 55. Stansifer is also not very
forthright about the evaluation of attributes; the fact is that it is possible
for a definition to be inconsistent, so that some (or even all) attributes do
not have values; however, it is rather complex to give a precise definition
for consistency. I also note that the diagrams for attribute evaluation are
much more effective if they are drawn in real time using several
colors; there are many other cases where live presentation works much better
than reading a book; this should be good motiviation for your coming to class!

2.4 The first two occurrences of the word "list" in section 2.4.1
(page 62) should be replaced by "set". Stansifer uses the phrase "tally
notation" the notation that represents 0, 1, 2, 3, ... by the empty string, |,
||, |||, ..., but it is a variant of what we will later call Peano notion.
There is a typo on page 64, where it says that || + || = |||, i.e., 2 + 2 = 3!
For future reference, there is an OBJ version of
the Post system for the propositional calculus. Also, the production

xax
---
xxx

on page 65 is called an "axiom" but it isn't.

The "proof" on page 68 does not really deserve to be called a proof,
because it only sketches one direction and it completely omits the other
direction, which turns out to be much harder than what is sketched. It is
remarkable that the term "theorem" appears at different three levels:
(1) a theorem of the predicate calculus, i.e., some x for which
Th x is provable; (2) a theorem of the Post system for the
predicate calculus, which means a derivable term of that system, which
includes some terms of the form Th x, others of the form P
x, etc.; and (3) a theorem of mathematics, the proof of which is
discussed in the previous sentence.

2.5 I don't know why Stansifer is so dismissive about issues of
concrete syntax in this section; he should be pleased that he did such a good
job discussing them, and motivating various formalisms earlier in the chapter.

In algebraic terms, destructors are left inverses of constructors.
For example,

FirstOfBlock(Block(W1, W2)) = W1
SecondOfBlock(Block(W1, W2)) = W2

4.3 The punch line on page 119, about motivation
for abstract types is especially important, and could be taken to refer to
much of the other material in this section.

4.5.1 The classification of the different kinds of polymorphism is
due to Christopher Strachey of Oxford University, in lectures notes from 1967.
This is the same Strachey who introduced the language CPL which was
implemented as BCPL and which inspired C (the "C" is for Christopher); he also
introduced the very useful notions of "l-value" and "r-value" in these same
lecture notes; and he is the co-founder with Dana Scott of denotational
semantics.

4.5.2 Ada generics were designed by Bernd Krieg-Bruckner, who is
one of the leading European researchers on algebraic specification, now at the
University of Bremen, based on OBJ parameterized modules, though only part of
the OBJ functionality is included in Ada; in fact, the extra power of OBJ
modules avoids the difficulty with Ada described on page 126, through the use
of its interface theories.

4.5.3 There is a small bug on page 129, since the identifier
l should be associated with the type `a, not
bool. More importantly there is a bug in Theorem 3 on page 130,
since the type variable tau is "floating" in the way the result is stated; the
easiest way to fix it is to insert "some" before the first instance of tau.

5.2 The material on pages 152-153 is a link to the
compiler class; the use of offsets, a symbol table, etc. is good engineering,
which makes a very efficient runtime environment possible.

5.3 Hope (not HOPE) was a clever functional
programming language designed by Rod Burstall, named after Hope Park Square,
where his office was located in Edinburgh (Hope Park Square was named after a
civil engineer who had a large impact on that area of Edinburgh). Its pattern
matching feature, which led to that of ML, was in part inspired by discussions
between Burstall and Goguen about OBJ, which was also under development during
the same period. I hope you noticed the similarity between pattern matching
in ML and in OBJ. Moreover, the "pattern matching" capability (if we want to
call it that) of OBJ3 is much more powerful than that of ML, in particular,
allowing matching modulo associativity and/or commutativity. For example, if
we define

then the first equation can match 0 + s s 0 and the second can
match (s x) + y , returning s s 0 and s(x
+ y) , respectively. Matching modulo associativity and commutativity
can yield even more dramatic differences from ML pattern matching.

8.1 Please look at Stansifer's list on page 270 of
reasons to study semantics. Of course, algebraic denotational semantics adds
to this list the possibilities of testing programs and verifying
programs; you should compare this list with the corresponding list on
page 1 of Algebraic Semantics. Stansifer's characterization of
algebraic semantics on page 271 was written before the book Algebraic
Semantics came out, and hence only refers to basic initial algebra
semantics, not the more powerful combination of loose and initial semantics
that we are using here.

8.2 In a certain sense, denotational semantics is a special case of
basic initial algebra semantics. The term algebra T over the signature S of a
programming language P has all the well formed syntactic units of P in its
carriers, and is an initial S-algebra. If A is an appropriate S-algebra of
denotations, then the unique S-homomorphism [[_]] : T -> A is the denotation
function, and the various equations that express the homomorphism property are
exactly the equations that are usually written down in denotational semantic
definitions.

8.3 For example, the signature S of the decimal numeral example on
pages 272-273 looks as follows in OBJ3:

subsort D < N .
ops 0 1 2 3 4 5 6 7 8 9 : -> D .
op _ _ : N D -> N .

If we now let W denote the S-algebra with its D carrier containing the numbers
from 0 to 9, with its N carrier containing all the natural numbers, and with
_ _ on W defined to send n,d to 10*n+d, then the unique S-homomorphism
from T to W indeed gives the natural number [[n]] denoted by the numeral n.
All of the other examples in Stansifer can be seen in a similar light, though
of course the semantic algebra gets more and more complex. The homomorphism
equations express what is often called compositionality, which says
that the meaning of the whole can be computed from the meanings of its parts.
For the above example, the equation is

[[n d]] = 10*[[n]] + d.

8.8 Classical denotational semantics made the decision that all
denotations must be functions, usually higher order, but of course zeroth
order functions are included, which are just constants. For example the
denotation of a program in a simple langauge might be a function from States
to States. So this seems like a natural choice; however, some very difficult
mathematical problems arise when trying to give denotations to recursive
functions. Stansifer notes on page 282 that set theory is no longer adequate;
in fact, so called domain theory must be used, which can get very
complex indeed, since it solves the problem of denotations for fixpoint
functions in the lambda calculus.

Although this was a great advance for logic, it seems to me to have been a
step backwards for computer science, where texts are traditionally used to
denote procedures, e.g., in compilers and in runtime environments. In fact,
this is (in effect) just what algebraic denotational semantics does, and the
result is a vastly simpler treatment of recursive procedures, which also has
the additional merit that it directly supports executing and verifying
procedures, both of which are extremely difficult in a purely denotational
setting.

9.2 On page 307, Stansifer gives a simple example
of how the treatment of variables in Hoare logic can lead to unfortunate
results (namely, an obviously wrong program can be proved correct!).
Actually, Stansifer does not seem aware that the problem is with the treatment
of variables in this version of Hoare logic, and instead says it is due to the
definition of partial correctness; but he is wrong. Algebraic denotational
semantics, by treating different kinds of variables differently, allows a
notion of partial correctness where such silly trivially wrong programs cannot
be proved correct.

9.3 As noted in Algebraic Semantics, weakest preconditions do
not work correctly for specifications written in first order logic; you must
use infinitary logic (which is the logic of infinitely long expressions!) or
second order logic, and as a result things get much more complicated (see
p. 309). Also, Theorem 20 (p. 311) is not stated correctly: only relative
completeness holds, i.e., completeness assuming an oracle for theorems of
arithmetic. (Roughly speaking, the problem is that arithmetic is undecidable
(by a famous theorem of Goedel), and arbitrarily difficult theorems of
arithmetic may be needed in proving programs correct, but Hoare logic does not
provide any way to get theorems about arithmetic.)

0. One of the most important distinctions in
programming languages is that between syntax and semantics.
While BNF does a pretty good job with syntax, it remains very difficult to
understand programs and programming languages. Nevertheless, some progress
has been made, and today denotational semantics is widely considered to
be the best approach for giving meaning to programming languages, and hence to
programs. The book Algebraic Semantics of Imperative Languages
develops a variant of denotational semantics that is based on algebra, and in
particular, on a kind of equational logic, which is actually implemented in
the OBJ3 language. The book demonstrates that its approach is adequate for a
wide range of programming features, including arrays, various kinds of "calls"
for variables, and various kinds of procedures; although this is impressive,
it would still be difficult to define a real programming language like Ada.

The main idea of a denotational semantics for a language is to provide
denotations for each kind of phrase in the language (such as variable,
expression, statement, and procedure), and also to provide a systematic way to
combine the denotations of the constituents of a larger phrase to get its
denotation.

Formal semantics is an important branch of formal methods, an area
of computer science that is difficult, but of growing importance, concerned
with the semantic correctness of systems; it is currently considered to be
cost-effecitve for safety critical systems. All major chip manufacturers now
have formal methods groups, motivated in part by the huge cost of call-backs
if errors are found (as with the notorious Itel Pentium 5 arithmetic error).
NASA has a formal methods group, motivated by the many software failures that
have plagued aerospace efforts (such as the Ariane 5
rocket failure), and the difficulty of communication with distant unmanned
spacecraft. Manufacturers of medical equipment are also considering, and in
some cases using, formal methods, motivated by the cost of lawsuits if faulty
equipment causes death or injury. Similar concerns arise in the nuclear power
industry, the military, and many other areas.

As argued in some detail in the Preliminary Essay on Comparative Programming
Linguistics, the best way to appreciate a language is to understand
how it is intended to be used. OBJ was not designed as a language for writing
programs, but rather as a language for writing specifications, and in
particular, for writing semantics for programming languages. Once this is
clear, many of its unusual design choices can be appreciated, including its
mixfix syntax and subsort polymorphism, its use of algebra, and its term
rewriting capability. In particular, signatures provide a meta-syntax that
can be used to define the syntax of a programming language, and we will see
that equations and algebras can be used to define the semantics of a
programming language,

1.1 To be a bit more precise, an OBJ3 signature
can be considered a variant notation for a context free grammar, with some
additions like precedence and subsort polymorphism, and more importantly, with
an implementation, which is the very general parsing mechanism in OBJ3; this
provides a general and powerful way to handle syntax (we will later see that
OBJ also provides powerful ways to handle semantics). The sorts of a
signature correspond to the non-terminals of a grammar. OBJ3's subsort
polymorphism provides a highly consistent treatment of the most common kinds
of coercion, in a way that also supports the common kind of overloaded
operations. This is in contrast to the coercion mess found in many
programming languages, and it is very convenient for defining denotations.

The prefix notation defined in NAT is Peano notation, of which
Stansifer's "tally notation" is a postfix variant; some example terms appear
on pages 18 and 19 (though strictly speaking these use a larger signature).
There is a bug in Figure 1.2 on page 14: the sort should be Exp
instead of Nat.

1.2 A semantics for a signature is given by an algebra
A for that signature. Such an algebra gives a denotation for each sort
s, which is the set As of elements of A of
sort s, called the carrier of A, and similarly, the denotations
of the operation symbols in the signature are functions among appropriate
carriers of A.

1.3 The operations in the term algebra of a signature are
exactly the constructors (in the sense of Stansifer in Section 2.5)
for the abstract syntax of the context free language defined by that
signature. Moreover, the carrier of sort s of the term algebra
consists of exactly the abstract syntax trees (expressed as terms) for the
grammar of that sort. Neat!

To be more precise now, suppose G = (N,T,P,S) is a context free
grammar. Then the signature for G, denoted SigmaG,
has as its sort set N, the non-terminals of G, with operations
derived from the productions of G as follows: if

p: N ->
w1 N1 w2 N2 ... wn
Nn wn+1

is a production in P with each Ni a non-terminal,
and each wi a string of terminals, then the corresponding
operation is

p:
N1 N2 ... Nn -> N

and the SigmaG-term algebra is exactly the algebra of
abstract syntax terms (or trees) for G.

1.4 The notion of assignment in Definition 9 is essentially
the same as in Stansifer on page 67, but much more general.

1.6 In Definition 14 on page 30, on the first line,
(TSigmaU Xi)s should instead be (TSigma U
Xi)s.

The rule [SM] on page 31 should be

(s X)* Y = (X * Y)+ Y .

1.6.3 The last line of the definition of NATEXPEQ on
page 39 should be

eq (s X)* Y = (X * Y)+ Y .

(Thanks to Bob Boyer, CSE 230 W'01.) Moreover, the same typo occurs in the
rule [SM] on page 31.

The "Theorem of Constants" is really a justification for the universal
quantifier elimination rule:

To prove an equation (forall X)e over a signature Sig, it suffices to
prove e over Sig(X).

Or more precisely and more generally,

A |-Sig(X) P implies A
|-Sig (forall X) P

where P is a first order sentence with equations as atoms, A is some set of
first order axioms, and |-Sig indicates provability over signature
Sig.

There is, however, an important caveat regarding use of the disequality
predicate (see Section 2.1.1). For example, suppose we are trying to prove a
first order formula (forall X,Y) P(X,Y) over Sig, and use the Theorem of
Constants to reduce it to proving P(x,y) over Sig(x,y). Because x=/=y (since
they are distinct constants), what we have really proved is

(forall X,Y) X=/=Y implies P(X,Y) .

However, if the proof of P(x,y) never makes use of x=/=y, then we have
actually proved (forall X,Y) P(X,Y). [You can check whether OBJ3 ever uses
x=/=y by turning on trace, saving the output in a file, and then searching it
using a good editor.]

But what should be done if the proof does use x=/=y? We can complete the
proof by proving P(x,x), which then gives (forall X) P(X,X). This can be
justified by considering the two proofs as parts of a case analysis, where the
two cases are X=/=Y and X=Y.

2.1 The programming language studied in this book is so simple
that it suffices to use just stores as states of the run-time system; we do
not need the extra power of having environments in addition to stores, as in
Section 3.3.1 of Stansifer. Unfortunately Stansifer's terminology clashes
with ours, since Stansifer uses the term "state" for what I would rather call
a "store," which is a map from locations to values (I would reserve "state"
for the whole thing, which may include both environments and stores);
moreover, Stansifer also uses the term "environment" for what I call "store"
in the case where the is no "environment" (that is, a map from identifiers to
values). However, this situation should not any cause confusion in discussing
our semantics; all these terms can be used synonymously because there is only
one thing that they could refer to anyway.

2.1.1 You can skim the technical discussion of the equality and
disequality operations in this subsection, because the details are not needed
until later; but you should read the notes for Section 1.6.3 above.

2.2 The semantics that we begin to define here is enormously more
simple than that given in Chapter 9 of Stansifer (I hope you agree!), and has
the additional advantage that it directly supports mechanical correctness
proofs using OBJ.

3. The one thing that I would most emphasize about
this chapter is that all the semantic definitions in it are absolutely
natural and absolutely simple, in the sense that there really
isn't anything else you could write. The only exception to this is the use of
EStore instead of just Store, which seems artificial
at this point, because in is really not needed for the constructions given in
Chapter 3; in fact, Proposition 27 can be seen as proving that
EStore isn't needed here. However, as the chapter repeatedly
emphasizes, EStore becomes absolutely necessary when programs can
have while loops (since these may not terminate), and so we may
as well write the definitions in the way we will eventually need them anyway.

3.3 The proof of Proposition 27 is not difficult, and the result is
intuitively obvious. However, it is a bit technical to give a precise
definition for "structural induction" and to justify it; also, the formulation
of program terminatation in Proposition 27 is a bit technical, and
requires some thinking to be understood.

5.1 When proving the (partial) correctness of a
loop, the invariant appears both as an assumption (on entering the loop) and
as a goal. This means that it must be treated in two completely different
ways. We illustrate these different treatments by working with a formula F of
the form

(forall Q(X)) P1(X) and P2(X)

where Q(X) is something like 1 < X < N. This formula is really an
abbreviation for an implication, of the form

(forall X) Q(X) implies P1(X) and P2(X).

If P1 and P2 are both equations, then in assuming this formula, we introduce
two conditional equations,

On the other hand, in trying to prove the formula, we would first eliminate
the quantifier, then eliminate the implication, and finally eliminate the
conjunction, so that the setup would be something like the following:

Of course, things are more complex for an invariant, because of taking account
of the state, the precondition, etc.

6. Here for the first time, we see some non-trivial algorithms and
proofs, and it is interesting to consider what we can learn from this
encounter. Please note that it is not claimed that this is the best
way to program, or to develop algorithms (although many books on formal
methods do make claims of this kind). Second, please note that the proofs in
this section are neither completely formal nor completely informal; the
intention is to develop a middle way between the enormously detailed tedium of
fully formalized proofs and the error-ridden clarity of informal proofs, such
that OBJ can deal with the complexities of the programming language semantics,
and the user (i.e., you) can deal with the structure of the proof. Third,
please note that it is expedient to do an informal proof first, or at least in
parallel, rather than to try an OBJ proof without knowing what it should look
like; OBJ can help you check whether your informal proof plan actually works,
and it can help you to carry out and debug that plan, but it cannot produce a
proof plan by itself.

Because OBJ only does reduction, and is not a theorem prover for first
order logic, you will encounter difficulties and details that are normally
hidden when proofs are done by hand; some people hate this, but my viewpoint
is that this is a very interesting phenomenon! Who would have guessed that
doing "simple" proofs would involve such work? This is related to the failure
of the classical methods of Artificial Intelligence to conquer software
development, or indeed, any very difficult domain, and also to the failure of
Hilbert's program to formalize mathematics. Most people have no idea what the
difficulties with these projects actually were, but those who work through
this section will.

7.1.2 The procedure swap(X,Y) actually is correct when X=Y, but this
is not proved by the proof score in the book. But you can handle that case by
doing the corresponding things for swap(x,x), where x is a new constant of
sort Var. However, we should not expect that in general parameterized
procedures will work correctly when some of their arguments are equal.

B. Well founded induction does require proving
P(0): if we let x=0, then we get the implication true => P(0), which is
equivalent to P(0), as our proof obligation for this case.

The reading on the lambda calculus in
OBJ gives a fully formal definition for the syntax and operational
semantics of the lambda calculus, along with numerous examples, including
(among other things) the following:

showing that the lambda calculus is non-terminating, by giving a
specific calculation that doesn't terminate;

showing that alpha renaming is sometimes required in order for beta
reduction to give the result it should;

some combinators and an indication of how to prove combinator identities
(though this can also be done more directly without using the lambda
calculus);

(the beginnings of the demonstration) that logic, arithmetic, and list
processing can be done with just lambda expressions (to me, it seems amazing
that this is possible!).

This presentation of the lambda calculus has some unusual features, including
explicit runtime error messages and a slightly more readable syntax. It is
recommended that you play with this OBJ specification yourself, because this
will give you a much better feeling for the lambda calculus than just reading
about it. For example, you could replace the "[_ _]" syntax for
application by the traditional "_ _" notation and see how things go; and
you can make its parsing associate to the left with the attribute "[gather (E
e)]". It is also worth noting that historically term rewriting arose as an
abstraction of the lambda calculus; for this reason, it is very natural to use
it to describe the lambda calculus.

We will now examine some of the new languages that have been spawned by the
recent explosion of interest in the internet. Among these, the currently most
important may be Java, HTML, JavaScript, Perl, and XML. One interesting
observation about these languages is that they differ greatly from the
classical programming languages that are traditionally studied in courses like
CSE 130 and 230, of course because they serve different purposes.

Let's start with Java. Probably security issues have been addressed to a
greater extent in Java than in any other programming language, and many
unusual design decisions are due to security concerns. However,
platform-independence and portability were perhaps the major forces driving
the design of Java, and it is these that motivate the unusual decision to
implement it using interpretation on an abstract machine. The concerns with
security and portability are of course motivated by the use of the language on
the internet, as is the use of threads for (psuedo-)concurrent execution. The
use of APIs allows portability without sacrificing functionality, and in
particular provides extensive support for interactive graphics.

The "ML" in HTML is for "Markup Language," and HTML is of course not a
programming language, but a language for describing multimedia content,
originally in a way that is independent of the display device to be used,
though later evolution of the language introduced many features that allow
graphic designers to produce more pleasing layout for specific browsers. It
would be interesting to survey all the effects that commercial competition had
on HTML, but let it suffice to note that both MicroSoft and Netscape
introduced non-standard features in an attempt to lock-in customers.

Although HTML is not a programming language, some programming language
features are often desirable in writing content for display on web pages. For
example, one wants simple procedures for buttons, menus, etc., rather than
having to code them up from scratch. Sometimes one also wants functionality
such as counting the number of mouse clicks, where simple programming language
features would come in handy. JavaScript is a low power programming language
designed for just such purposes; it is relatively simple, but has a lot of
"widget" procedures to support interactive graphics. One would not want to
use JavaScript for general purpose programming, e.g., for writing a compiler.

Perl is a language that fills a small but important niche in the internet
world; it has many features that make it unsuitable for general purpose
programming, such as being untyped and having weak modularity. But it is
ideal for quickly writing relatively small translators, for example, into SQL,
and it has been called "the duct tape of the intnet." It is also notable that
Perl is an open source effort, and has very high quality implementations and
documentation. See Perl: The first
postmodern computer language, by Ed Wall, the designer of Perl, for
an amusing discussion.

Finally, XML serves as a kind of meta-language for HTML (though the "ML" in
its name is still officially for "Markup Language" not "Meta Language," and
the "X" is for "extensible"). Like HTML, XML is simplified from SGML, but
unlike HTML, it enables users to define their own new tags. The impetus for
developing this languages comes primarily from B2B applications, where it is
expected to be used very extensively. However, it is also of interest for
applications in the sciences, and of course in computer science. In fact, we
have used it in the Kumo system being
developed in my own lab. (This system also uses HTML and JavaScript, of
course.)

2 Here are some questions that may aid you in reading Chapter 2: Why
does ML have all of tuples, strings, and lists? Why does ML have explicit
coercions? Why do binary functions take tuples as arguments? In my opinion,
what is remarkable about ML is that these questions (and many others of a
similar nature) have good answers, because the language is exceptionally well
designed; for example, similar questions about C do not really have good
answers.

2.3.3 The box on page 31 seems to claim that ML's val
declaration is side-effect free, but this is arguable, and I am more inclined
to disagree than to agree, although a case for the other side can of course
also be made. Consider the following ML code:

Certainly we get very different values for t and u,
depending on what "assignments" have been done previously. By the way, ML
also has a "real" assignment statement that no one would argue is side-effect
free, so ML is definitely not a pure functional language, although it does
have a very nice functional sublanguage.

3.2.1 The discussion of what Ullman calls "environments" (and I
would call stacks of environments) is easy to follow, but leaves out some
extra details needed for the imperative features of ML; Stansifer, Section
5.2, has more detail, more precision, and more generality. Ullman might have
mentioned that these clever ideas come to ML from Lisp, and are modified forms
of clever ideas for implementing Algol 60, that arose in IFIP WG 2.1.
(Stansifer might also have mentioned the role of WG 2.1 in his discussion of
block structure in Chapter 5.)

3.3 Ullman is not very good on history; for some historical
information on ML pattern matching, see the discussion of section 5.3 of Stansifer above.

5.5 To curry a function is a certain way to get an equivalent
function with domain a function type instead of a product type. For example,
given f of type Int * Int -> Int, we can define the
equivalent function f' of type Int -> (Int -> Int)
by

fun f' m n = f(m,n);

Thus, f'(6) is a function of type Int -> Int. There
is a nice mathematical expression for currying, which also include its
converse, called uncurrying, given by the following isomorphism

[(T1 * T2) -> T] ~ [[T1 -> [T2 -> T]]

where [T] indicates the set of functions having type
T, and ~ indicates isomorphism.

5.6.3 The following definition for map is more
idiomatic ML than the one given by Ullman on p. 177 using let,
although Ullman's version does have some expository value.

fun map F nil = nil
| map F (x :: xs) = F x :: map F xs;

Similarly, the definition of comp in the box on p. 177 is more
idiomatic than Ullman's version using let in Figure 5.20 on
p. 176. (These alternative definitions are more idiomatic because they make
better use of the capabilities of ML.)