if you set out to prove "predicates" about a program, there are two
(common) ways to go about it.

the first way involves writing a program, and then either in the
language of the program or some other "more logical" language writing
down some formula about the program and trying to construct a proof.
proofs performed in this order are constructed by repeatedly reducing
the terms of the proof towards axioms. you are given a list of
transformations on the formula which are permitted, with the
understanding that rewriting a formula in some sense preserves
"truth", or at least logical implication. once you arrive at a
formulation which is all axioms or instances of abstract axioms, you
are done your proof. sometimes you have tool support to automate
reducing the tedious parts of your formulas to axioms. formal systems
which are intended to work this way include ACL2, Coq, HOL, NuPRL,
Isabelle, Twelf, and many other logic and meta-logic frameworks.

the second way involves writing a formula expressing the conjunction
of facts you eventually want your program to satisfy, and transforming
that formula into the program by successive refinements
(rewrites). you are given, like in the first case, a list of the sorts
of syntactic transformations which constitute a legal refinement, with
the understanding that a refinement of a term "means the same thing",
only the refinement is "more concrete" than the refined. when you have
nothing left but terms in your abstract machine language, you have
completed the refinement and you can probably run the resulting
program. some times you have tool support to automate performing
tedious refinements of formulas into machine code. formal systems
which are intended to work this way include Z, VDM, RAISE, B Method,
Refinement Calculus, and Hehner's "practical theory of programming".

typically the underlying logics of members of either such family
permit their use in one approach or the other, but the tools and
logics may display the preference of their creators.

one way of viewing programs is as "active" objects, about which
certian assertions may hold from time to time. in this view, people
say things like "what does this program do?" the program text
is viewed as suspicious, vulgar; to do "real" reasoning, you need to
relate the program text to something clean like mathematics or
formal logic where there are many good theorems (never mind the
ambiguity and duplication in their notations).

another way of viewing programs is to say that programs do
nothing; computers do things and programs are simply formal
suggestions to the computer. a program in a high level language is a
formal suggestion to another program (a compiler, expressed in a low-level language such as machine
code) about a sequence of i/o operations which may result in the
production of another program in a low level language. in all cases,
however, a program is a completely static entity; a single
formula with rules which may apply to describe legal transformations
of it.

in this latter view, the most important thing a program suggests is a
relationship between pre- and post- states of a computer. this "large"
relationship is stated as a sequence of instructions, but each
instruction suggests at a fundamental level nothing more than a
relationship of the pre- and post- states of a computer (including i/o
of course). the post-state of one instruction is the pre-state of the
next. this is a simplification of modern instruction pipelines but it
is often a sufficient level of detail for reasoning.

so suppose you write down an instruction using the "static" view of a
program. say:

1: x <- k * v

if instead you take the first view, you may then feel the need to
"annotate" this with the "formal" property that the post-state of the
computer contains a variable whose arithmetic value is the product of
two others:

// annotation: { post[1][x] = pre[1][k] * pre[1][v] }
1: x <- k * v

it may strike you as suspicious at this point that you've just written
down two nearly identical expressions, but recall that in this view
the program is an "active" object (subject to wild fits of
unpredictability) and the annotation spec, in some more "mathematical"
semantics, is the only "formal" object.

what this means, when you say that an expression is "formal", is that
the formal object is the only one you have confidence about
your reasoning about, and transformation and of. for example, say you
later have an instruction:

and then you can (somehow) relate this back to the formal text of the
program to see that it's legal to rewrite as:

5: y <- k

but notice that to perform this reasoning, you had to perform a "round
trip" in and out of the "formal" system. after all, you cannot reason
about filthy active objects like programs; they might do anything! and
of course in order to faithfully do the "translation" back and forth
between the "active" and "formal" parts of the program, you need a
formal relationship between your "formal" notation and the
program text. oops! you've just accidentally raised the program text
to the level of a "formal" system anyways!

this is the convoluted thinking (unfortunately common, in "formal
methods" circles) that Hehner sets out to change. he thinks that the
program is a formal expression of a relationship between pre-
and post- states of a computer, and if you tidy up common programming
languages you will find buried inside them a simple boolean expression
logic which uses the programming language as its fundamental
terms. otherwise you spend all your time translating back and forth
between the program text and your "real" formalism, and you get so
much clutter that any reasonable programmer gives up in disgust.

lest you think this is completely implausible, have a look at ACL2.
it is a fully functioning, quite usable system which performs exactly
this sort of "reasoning about the program text" for a fully executable
lisp language.

it is good to think of programs as logical terms, because you can
then consider the "specification" or "interface" of a program as some
logical term which is implied by another term, say the implementation of the program or the assembly language of it. these are all terms within the same logic, some using a restricted vocabularly but all logically related. the act of
"writing a program" is then viewed as "locating a more syntactically restricted term which logically
implies the more abstract specification".

this is a useful view since it eliminates the pointless distinction
between specification and programming language. you view the
programming language as the subset of the specification language that
you happen to have hardware which can follow.

for example, say I have a specification / program of the form:

for i from 1 to 10: x[i] := x[i] + 1;

there is no "direct" encoding of that sequence of symbols as
something my hardware will try to follow. but there is another program
which logically implies the first one:

and this, given a small change-of-encoding into machine words, is
a specification my hardware will follow. but note the relationship:
the pseudo-assembly implies the pseudo-C, not the other way
around. it implies it, not in the informal sense of "it's a reasonable
translation", but rather in the formal sense of a computer with memory
that can assume various states, and control that passes from
instruction to instruction, with the post-state of one instruction
being the pre-state of the next. what if your logic doesn't have the
symbols "jump" or "for" or ":=" in it? well, it's time to change to a logic that
does.

document formats

three things worth pointing out wrt. document formats:

editability is a factor worth paying close attention to. many
users want to be able to "freeze" a document and make it non-editable,
either in the name of preserving "pristine content" or (benevolently)
by composing content using a higher-level abstraction than the
delivery format.

this has similar moral character to compiling source code, insofar
as derived documents are very difficult to produce. many people think
that derived documents never happen in the real world; I would advise
such people to consider how much worse the web would be without "view
source".

machine-readability is also an important factor. if you assume many
document retrieval tasks in the future will be partially machine-assisted
(beyond merely presentation) then you've got to keep things like citeseer,
google, and glimpse in mind when working on document formats.

for non-editable, mostly presentation-oriented documents, the djvu format is very good.

I'm working fully-remote now, which is an interesting
change, but
really has just served to demonstrate the flexibility of
free software
to me; with a little bit of glue here and there, the system
feels
exactly as if I were in the office. minus all the socializing :(

ported some bits of slib
to ocaml and built a quick sqlite binding.
finding
ocaml just keeps growing on me as a "serious hacking" language.

otherwise pretty relaxed. I'll be heading up to OLS in a
couple weeks
to give a talk on building ridiculously small domain specific
compilers. and then I'm taking a fantastic vacation through the
western end of the country. very much looking forward to it.

I think that eval, in its lisp form, is another
of those
sorts of things the ML/haskell world usually frowns
on as "a bit too dynamic". you don't even know when an
eval'ed sexp is well typed. that's scary. that said, there
is still dynamic caml should
the need arise.

TCL is actually an interesting case. I used to think it
was just a complete joke, what with basically no datatypes,
but I'm finding myself occasionally forced to use it (sigh..
work), and each time
I am impressed with how far you can go with a clever quoting
system and a single datatype. it's further than you think.

with respect to camlp4, I do think the macros offered
are "as powerful" as those in lisp, by any reasonable
measure. consider:

you can write your own lexer. you don't have to shoehorn
your macro notation into sexps</a>

you have explicit or implicit control over source
coordinates

you can programmatically construct your results, or
construct them through quotations, or both

you can name and layer quotation expansions of different
notations

you can replace the core grammar of ocaml, essentially
re-tasking the ocaml compiler to a "new" language with ocaml
semantics.

I think you are right about the scene graph location. I was
initially quite a fan of "location transparency", but have
come to feel that there are always at least 2
locations in any computer program: here and
elsewhere. Here is at least host-local, probably
user-local, process-local, and even thread-local. Really, I
don't think a display server ought to have more than one
thread anymore either. I think Miguel was quite right in
that argument we had years ago. But that's a different
kettle of fish.

In any case, collapsing here and elsewhere
into just one case leads to, well, corba. Increasingly, I
don't think corba is wise at all; remote resources are just
too intrinsically different from pointers.

One might be motivated, then, to wonder what remote
resources are like. Obviously with some sequence
number trickery, remote resources can be organized
into things like streams (hence TCP socket ~~ file
descriptor), but can they be made into something even more
structured? 9p thinks remote resources can appear like a VFS
tree: open, read, write, close, and walk. Seems a
useful, and mostly harmless extension. But I think it's
important to keep open vs. closed, and explicit i/o, as part
of the equation. Trying to do those things transparently is
trouble.

I wish paul graham would stop publishing articles
claiming lisp has the only useful syntactic metaprogramming
system. it's especially irritating since he seems like a
smart guy,
and should know better. you don't need car and cdr
to do good syntactic metaprogramming (indeed, alone they are
awkward);
you just need a quasi-quote notation and a place to hook
your extensions into the evaluator.

hell, even TCL satisfies the menu here,
though few people bother to learn it. unless he's been
hiding under a rock for a couple years, I can't believe he's
seen and understood camlp4, and still
thinks defmacro is "unique to lisp".

there is no deep theoretical difference between static and
dynamic type systems. you are still talking about writing a
specification (program) in which the computer may do
something you do not expect or desire. the only practical
question worth asking is: how much confidence do you have
that the specification you wrote will get the computer to do
the thing that you, or your customers, want?

testsuites test some cases. type systems check consistency.
the assumption is "consistency implies correctness", and
it's hard to come up with much better definition for
correctness than both "testsuite pass" and
consistency; in practise this is as close to true as we get.

I've used many C and lisp systems professionally, and
a few HM systems, and the HM systems are by far more
designed around consistency checking. you can
consistency-check lisp if you have the luxury of using ACL2
or a vendor compiler with a deep partial evaluator or
soft-type inferencer. but usually in lisp you are working on
a lousy implementation, with an OO or struct system someone
threw together for fun, and less checking even than the
"disjointness guarantee" inherent in fundamental scheme
types. likewise in C you are not usually working with a
larch or lclint-annotated system; you've got a system which
"made up" an OO type system on top of liberal pointer
casting, and not a hope in hell of checking its consistency.
C++ thankfully has a good type system, but as the original
article points out it can get pretty brutal to compile.

the more checking I can perform before I ship something to a
customer, the better. imho everything else is religion and
bafflegab. ocaml and haskell currently give me lots of good
checking without taking away too much, so I use them when I can.