Harriet Tubman
Iris and I had a pretty heavy conversation in the car on the way home
from school last week. I should begin by saying that Philadelphia has a lot of murals.
More murals than any other city in the world, in fact. The mural arts
people like to put up murals on large, otherwise ugly party walls.
That is, when you have two buildings that share a wall, and one of
them is torn down, leaving a vacant lot with a giant blank wall, the
mural arts people see it as a prime location and put a mural there.
On the way back from Iris's school we drive through Mantua, which is
not one of the rosier Philadelphia neighborhoods, and has a lot of
vacant lots, and so a lot of murals. We sometimes count the murals on
the way home, and usually pass four or five.

Iris pointed out a mural she liked, and I observed that there was
construction on the adjacent vacant lot, which is likely to mean that
the mural will be covered up soon by the new building. I mentioned
that my favorite Philadelphia mural of all had been on the side of a
building that was torn down in 2002.

Iris asked me to tell her about it, so I did. It was the giant mural
of Harriet
Tubman that used to be on the side of the I. Goldberg
building at 9th and Chestnut Streets. It was awesome. There was
40-foot-high painting of Harriet Tubman raising her lantern at night,
leading a crowd of people through a dark tunnel (Underground Railroad,
obviously) into a beautiful green land beyond, and giant chains that
had once barred the tunnel, but which were now shattered.

It's hard to photograph a mural well. The scale and the space do not
translate to photographs. It looked something like this:

Note that the small people at the bottom are actually larger than
life-size.

Here's a detail:

One cool thing about it that you can't see in the picture is that the
column of stones on Tubman's right is painted so as to disguise an
large and ugly air conditioning vent that emerges from the wall and
climbs up to the roof. The wall is otherwise flat.

Anyway, I said that my favorite mural had been the Harriet Tubman
one, and that it had been torn down before she was born. (As you can
see from the picture, the building was located next to a parking lot.
The owners of the building ripped it down to expand the parking lot.)

But then Iris asked me to tell her about Harriet Tubman, and that was
something of a puzzle, because Iris is only three and a half. But the
subject is not intrinsically hard to understand; it's just
unpleasant. And I don't believe that it's my job to shield her from
the unpleasantness of the world, but it is my job to try to answer her
questions, if I can. So I tried.

"Okay, you know how you own stuff, and you can do what you want,
because it's yours?"

Sure, she understands that. We have always been very clear in
distinguishing between her stuff and our stuff, and in defending her
property rights against everyone, including ourselves.

"But you know that you can't own other people, right?"

This was confusing, so I tried an example. "Emily is your friend, and
sometimes you ask her to do things, and maybe she does them. But you
can't make her doing things she doesn't want to do, because she gets
to decide for herself what she does."

Sure, of course. Now we're back on track. "Well, a long time ago,
some people decided that they owned some other people, called slaves,
and that the slaves would have to do whatever their owners said, even if
they didn't want to."

Iris was very indignant. I believe she said "That's not nice!" I
agreed; I said it was terrible, one of the most terrible things that
had ever happened in this country. And then we were over the hump. I
said that slaves sometimes tried to run away from the owners, and get
away to a place where they could do whatever they wanted, and that
Harriet Tubman helped slaves escape.

I did not touch the racial issue at all. When you are explaining
something complicated, it is important to keep it in bite-sized
chunks, and to deal with them one at a time, and I thought slavery was
already a big enough chunk. Iris is going to meet this issue head-on
anyway, probably sooner than I would like, because she is biracial.

I explained about the Underground Railroad, and we discussed what a
terrible thing slavery must have been. Iris wanted to know what the
owners made the slaves do, and there my nerve failed me. I told her
that I didn't want to tell her about it because it was so awful and
frightening. I had pictures in my head of beatings, and of slaves
with their teeth knocked out so that they could be forced to eat, to
break hunger strikes, and of rape, and families broken up, and I just
couldn't go there. Well, I suppose it is my job to shield her from
some the unpleasantness of the world, for a while.

I realize now I could have talked about slaves forced to do
farm work, fed bad food, and so on, but I don't think that would
really have gotten the point across. And I do think I got the point
across: the terrible thing about being a slave is that you have to do
what you are told, whether you want to or not. All preschoolers
understand that very clearly, whereas for Iris, toil and neglect are
rather vague abstractions. So I'm glad I left it where I did.

But then a little later Iris asked some questions about family
relations among the slaves, and if slaves had families, and I said
yes, that if a mother had a child, then her child belonged to the same
owner, and sometimes the owner would take the child away from its
mother and sell it to someone else and they would never see each other
again. Iris, of course, was appalled by this.

I'm not sure I had a point here, except that Iris is a thoughtful kid,
who can be trusted with grown-up issues even at three and a half years
old, and I am very proud of her.

That seems like a good place to end the year. Thanks for reading.

[ Addendum 20080201: The mural was repainted in a new location, at
2950 Germantown Avenue! ]

Welcome to my ~/bin
In the previous article I
mentioned "a conference tutorial about the contents of my
~/bin directory". Usually I have a web page about each
tutorial, with a description, and some sample slides, and I wanted to
link to the page about this tutorial. But I found to my surprise that
I had forgotten to make the page about this one.

So I went to fix that, and then I couldn't decide which sample slides
to show. And I haven't given the tutorial for a couple of years, and
I have an upcoming project that will prevent me from giving it for
another couple of years. Eh, figuring out what to put online is more
trouble than it's worth. I decided it would be a lot less toil to just put the whole thing
online.

I think the title is an accidental ripoff of an earlier class by Damian
Conway. I totally forgot that he had done a class on the same
subject, and I think he used the same title. But that just makes us
even, because for the past few years he has been making money going
around giving talks on "Conference Presentation Aikido", which is a
blatant (and deliberate) ripoff of my 2002 Perl conference talk on Conference
Presentation Judo. So I don't feel as bad as I might have.

Another trivial utility: accumulate
As usual, whenever I write one of these things, I wonder why it took
me so long to get off my butt and put in the five minutes of work that
were actually required. I've wanted something like this for
years. It's called accumulate. It reads an input of
this form:

k1 v1
k1 v2
k2 v3
k1 v4
k2 v5
k3 v6

and writes it out in this format:

k1 v1 v2 v4
k2 v3 v5
k3 v6

I wanted it this time because I had a bunch of files that included some
duplicates, and wanted to get rid of the duplicates. So:

md5sum * | accumulate | perl -lane 'unlink @F[2..$#F]'

(Incidentally, people sometimes argue that Perl's .. operator
should count backwards when the left operand exceeds the right one.
These people are wrong. There is only one argument that needs to be
made to refute this idea; maybe it is the only argument that
can be made. And examples of it abound. The code above is one
such example.)

I'm afraid of insulting you by showing the source code for
accumulate, because of course it is so very trivial, and you
could write it in five minutes, as I did. But who knows; maybe seeing
the source has some value:

It's tempting to add a -F option to tell it that the input is
not delimited by white space, or an option to change the output
format, or blah blah blah, but I managed to restrain myself, mostly.

Several years ago I wrote a conference tutorial about the
contents of my ~/bin directory. The clearest
conclusion that transpired from my analysis was that the utilities I
write have too many features that I don't use. The second-clearest
was that I waste too much time writing custom argument-parsing code
instead of using Getopt::Std. I've tried to learn from
this. One thing I found later is that a good way to sublimate the urge to
put in some feature is to put in the option to enable it, and to
document it, but to leave the feature itself
unimplemented. This might work for you too if you have the same
problem.

I did put in -k and -v options to control which
input columns are accumulated. These default to the first and second
columns, naturally. Maybe this was a waste of time, since it occurs
to me now that accumulate -k k -v v could be
replaced by cut -fk,v | accumulate, if only
cut didn't suck quite so badly. Of course one could use
awk {print "$k $v" } | accumulate to escape
cut's suckage. And some
solution of this type obviates the need for accumulate's
putative -F option also. Well, I digress.

The accumulate program itself reminds me of a much more
ambitious project I worked on for a while between 1998 and 2001, as
does the yucky line:

push @{$K{$F[$opt{k}]}}, $F[$opt{v}];

The ambitious project was tentatively named "twingler".

Beginning Perl programmers often have trouble with compound data
structures because Perl's syntax for the nested structures is so
horrendous. Suppose, for example, that you have a reference to a
two-dimensional array $aref, and you want to produce a hash,
such that each value in the array appears as a key in the hash,
associated with a list of strings in the form "m,n"
indicating where in the array that value appeared. Well, of course it
is obviously nothing more than:

The idea of twingler was that you would specify the transformation you
wanted declaratively, and it would then write the appropriate Perl
code to perform the transformation. The interesting part of this
project is figuring out the language for specifying the
transformation. It must be complex enough to be able to express most
of the interesting transformations that people commonly want, but if
it isn't at the same time much simpler than Perl itself, it isn't
worth using. Nobody will see any point in learning a new declarative
language for expressing Perl data transformations unless it is itself
simpler to use than just writing the Perl would have been.

There are some hard problems here: What do people need? What
subset of this can be expressed simply? How can we design a simple,
limited language that people can use to express their needs? Can the
language actually be compiled to Perl?

I had to face similar sorts of problems when I was writing linogram, but in the case of
linogram I was more successful. I tinkered with twingler for
some time and made several pages of (typed) notes but never came up
with anything I was really happy with.

At one point I abandoned the idea of a declarative language, in favor
of just having the program take a sample input and a corresponding
sample output, and deduce the appropriate transformation from there.
For example, you would put in:

And then presumably you could eyeball this, and if what you really
wanted was @{$a1}[0, -1] instead of @$a1 you could
tinker it into the form you needed without too much extra trouble.
This is much nicer from a user-experience point of view, but at the
same time it seems more difficult to implement.

I had some ideas. One idea was to have it generate a bunch of
expressions for mapping single elements from the input to the output,
and then to try to unify those expressions. But as I said, I never
did figure it out.

It's a shame, because it would have been pretty cool if I had gotten
it to work.

The MIT CS grad students' handbook used to say something about how you
always need to have several projects going on at once, because
two-thirds of all research projects end in failure. The people you
see who seem to have one success after another actually have three
projects going on all the time, and you only see the successes. This
is a nice example of that.

Strangest Asian knockoff yet
A few years ago Lorrie and I had brunch at the very trendy
Philadelphia restaurant "Striped Bass". I guess it wasn't too
impressive, because usually if the food is really good I will remember
what I ate, even years later, and I do not. But the plates
were awesome. They were round, with an octagonal depression in the
center, a rainbow-colored pattern around the edge, garnished with
pictures of ivy leaves.

Good plates have
the name of the maker on the back. These were made by Villeroy & Boch. Some
time later, we visited the Villeroy & Boch outlet in Woodbury, New York, and I
found the pattern I wanted, "Pasadena". The cool circular plates from
Striped Bass were only for sale to restaurants, but the standard ones
were octagonal, which is also pretty cool. So I bought a set. (57%
off list price! Whee!)

(The picture gets much bigger when you click it.)

They no longer make these plates. If you broke one, and wanted a
replacement, you could buy one online for $43.99. Ouch! But there is
another option, if you are not too fussy.

Many years after I bought my dishes,
I was shopping in one of the big Asian grocery stores
on Washington Avenue. They have a kitchenware aisle. I found this
plate:

The real VB plate is made of porcelain. The Washington Avenue
knockoff is made of plastic.

Here is a third way. Sine and cosine are solutions of the
differential equation f = -f''. Since I now have enough
infrastructure to get Haskell to solve differential equations, I can
use this to define sine and cosine:

Okay, now what? Is this something familiar? I'm wasn't sure. One
thing that might help a bit is to get the program to disgorge rational
numbers rather than floating-point numbers. But even that won't
completely solve the problem.

One thing I was thinking about in the shower is doing Fourier
analysis; this should at least identify the functions that are
sinusoidal. Suppose that we know (or believe, or hope) that some
power series
a1x +
a3x3 +
...
actually has the form
c1 sin x +
c2 sin 2x +
c3 sin 3x +
... . Then we can guess the values of the
ci by solving a system of n equations of the
form:

$$\sum_{i=1}^n
i^kc_i = k!a_k\qquad{\hbox{($k$ from 1 to $n$)}}$$

And one ought to be able to do something analogous, and more general,
by including the cosine terms as well. I haven't tried it, but it
seems like it might work.

But what about more general cases? I have no idea. If you have the
happy inspiration to square the mystery power series above, you get
[1, 2, 0, 0, 0, ...], so it is √(2x+1), but what if
you're not so lucky? I wasn't; I solved it by a variation of Gareth McCaughan's method of a
few days ago: f·f' is the derivative of
f2/2, so integrate both sides of
f·f' = 1, getting f2/2 =
x + C, and so f = √(2x + C).
Only after I had solved the equation this way did I try squaring the
power series, and see that it was simple.

To do that I decided I would need a function to calculate the square
root of a power series, which I did figure out; it's in
the earlier article. But
then I got distracted with other issues, and then folks wrote to me
with several ways to solve the differential equation, and I spent a lot of time writing
that up, and I didn't get back to the original problem until
today, when I had to attend the weekly staff meeting. I get a lot of
math work done during that meeting.

At least one person wrote to ask me for the Haskell code for the power
series calculations, so here's that first off.

A power series
a0 +
a1x +
a2x2 +
a3x3 + ...
is represented as a (probably infinite) list of numbers
[a0, a1, a2,
...]. If the list is finite, the missing terms are assumed to be all 0.

The really interesting operators perform division and evolve square roots
of functions. I discussed how these work in the earlier
article. The reciprocal operation is well-known; it appears in
Structure and Interpretation of Computer Programs,
Higher-Order Perl, and I presume elsewhere. I haven't
seen the square root extractor anywhere else, but I'm sure that's just
because I haven't looked.

Okay, so as usual that is not what I wanted to talk about; I wanted to
show how to solve the differential equation. I found I was getting
myself confused, so I decided to try to solve a simpler differential
equation first. (Pólya says: "Can you solve a simpler problem of the
same type?" Pólya is a smart guy. When the voice talking in your head
is Pólya's, you better pay attention.) The simplest relevant differential equation
seemed to be f = f'. The first thing I tried was
observing that for all f, f = f0 :
mul2 iotaR f'. This yields the code:

f = f0 : mul2 iotaR (deriv f)

This holds for any function, and so it's unsolvable. But if you
combine it with the differential equation, which says that f =
f', you get:

and in fact this works just fine. And then you can observe that this
is just the definition of int; replacing the definition with
the name, we have:

f = int f0 f
where f0 = 1 -- or whatever

This runs too, and calculates the power series for the
exponential function, as it should. It's also transparently obvious,
and makes me wonder why it took me so long to find. But I was looking
for solutions of the form:

f = deriv f

which Haskell can't figure out. It's funny that it only handles
differential equations when they're expressed as integral equations. I
need to meditate on that some more.

It occurs to me just now that the
f = f0 : mul2 iotaR (deriv f) identity above just says that
the integral and derivative operators are inverses. These things are
always so simple in hindsight.

Anyway, moving along, back to the original problem, instead of
f =
f', I want f2 + (f')2 = 1,
or equivalently f' = √(1 - f2). So I
take the derivative-integral identity as before:

And now I am done; Haskell cheerfully generates the power series
expansion for f for any given initial condition. (The
parameter f0 is precisely the desired value of f(0).)
For example, when f(0) = √(1/2), as above, the calculated
terms show the function to be exactly
√(1/2)·(sin(x) + cos(x)); when f(0)
= 0, the output terms are exactly those of sin(x). When
f(0) = 1, the output blows up and comes out as [1, 0, NaN, NaN,
...]. I'm not quite sure why yet, but I suspect it has something to
do with there being two different solutions that both have f(0) = 1.

All of this also works just fine in Perl, if you
build a suitable lazy-list library; see chapter 6 of HOP for complete
details. Sample
code is here. For a Scheme implementation, see SICP. For a Java,
Common Lisp, Python, Ruby, or SML implementation, do the obvious
thing.

But anyway, it does work, and I thought it might be nice to blog about
something I actually pursued to completion for a change. Also I was
afraid that after my week of posts about Perl syntax, differential
equations, electromagnetism, Unix kernel internals, and paint chips in
the shape of Austria, the readers of Planet Haskell, where my blog has
recently been syndicated, were going to storm my house with torches
and pitchforks. This article should mollify them for a time, I
hope.

which I was trying to solve by various methods. The article was
actually about calculating square roots of power series; I got
sidetracked on this. Before I got back to the original equation,
twofour readers of this blog had written in with solutions, all
different.

I got interested in this a few weeks ago when I was sitting in on a
freshman physics lecture at Penn. I took pretty much the same class
when I was a freshman, but I've never felt like I really
understood physics. Sitting in freshman physics class again confirms
this. Every time I go to a class, I come out with bigger questions
than I went in.

The instructor was
talking about LC circuits, which are simple circuits with a capacitor
(that's the "C") and an inductor (that's the "L", although I don't
know why). The physics people claim that in such a circuit the
capacitor charges up, and then discharges again, repeatedly. When one
plate of the capacitor is full of electrons, the electrons want to
come out, and equalize the charge on the plates, and so they make a
current flowing from the negative to the positive plate. Without the inductor, the current
would fall off exponentially, as the charge on the plates equalized.
Eventually the two plates would be equally charged and nothing more
would happen.

But the inductor generates an electromotive force that tends to resist
any change in the current through it, so the decreasing current in the
inductor creates a force that tends to keep the electrons moving
anyway, and this means that the (formerly) positive plate of the
capacitor gets extra electrons stuffed into it. As the charge on this
plate becomes increasingly negative, it tends to oppose
the incoming current even more, and the current does eventually come to a halt. But by
that time a whole lot of electrons have moved from the negative to the
positive plate, so that the positive plate has become negative and the
negative plate positive. Then the electrons come out of the
newly-negative plate and the
whole thing starts over again in reverse.

In practice, of course, all the components offer some resistance to
the current, so some of the energy is dissipated as heat, and
eventually the electrons stop moving back and forth.

Anyway, the current is nothing more nor less than
the motion of the electrons, and so it is proportional to the
derivative of the charge in the capacitor. Because to say that
current is flowing is exactly the same as saying that the charge in
the capacitor is changing. And the magnetic flux in the inductor is
proportional to rate of change of the current flowing through it, by
Maxwell's laws or something.

The amount of energy in the whole system is the sum of the energy
stored in the capacitor and the energy stored in the magnetic field of
the inductor.
The former turns out to be proportional to the square of the charge in
the capacitor, and the latter to the square of the current. The law
of conservation of energy says that this sum must be constant.
Letting f(t) be the charge at time t, then
df/dt is the current, and (adopting suitable units) one
has:

$$(f(x))^2 +
\left(df(x)\over dx\right)^2 = 1$$

which is the equation I was considering.

Anyway, the reason for this article is mainly that I wanted to talk
about the different methods of solution, which were all quite
different from each other. Michael Lugo went ahead with
the power series approach I was using. Say that:

Equating coefficients on both sides of the equation gives us the following equations:

!!a_0^2 + a_1^2!!

=

1

!!2a_0a_1 + 4a_1a_2!!

=

0

!!2a_0a_2 + a_1^2 + 6a_1a_3 + 4a_2^2!!

=

0

!!2a_0a_3 + 2a_1a_2 + 8a_1a_4 + 12a_2a_3!!

=

0

!!2a_0a_4 + 2a_1a_3 + a_2^2 + 10a_1a_5 + 16a_2a_4 + 9a_3^2!!

=

0

...

Now here's the thing M. Lugo noticed that I didn't. You can
separate the terms involving even subscripts from those involving odd
subscripts. Suppose that a0 and
a1 are both nonzero. The polynomial from the second
line of the table,
2a0a1 +
4a1a2,
factors as
2a1(a0 + 2a2),
and one of these factors must be zero,
so we
immediately have
a2 = -a0/2.

Now take the next line from the table,
2a0a2 +
a12 +
6a1a3 +
4a22.
This can be separated into the form
2a2(a0 + 2a2) +
a1(a1 + 6a3).
The left-hand term is zero, by the previous
paragraph, and since the whole thing equals zero, we have
a3 = -a1/6.

Continuing in this way, we can conclude that
a0 =
-2!a2 =
4!a4 =
-6!a6 = ..., and that
a1 =
-3!a3 =
5!a5 = ... .
These should look familiar from first-year calculus, and together they
imply that
f(x) = a0 cos(x) +
a1 sin(x), where (according to the first line
of the table)
a02 +
a12 = 1.
And that is the complete
solution of the equation, except for the case we omitted, when either
a0 or a1 is zero; these give the
trivial solutions f(x) = ±1.

Okay, that was a lot of algebra grinding, and if you're not as clever
as M. Lugo, you might not notice that the even terms of the
series depend only on a0 and the odd terms only on
a1; I didn't. I thought they were all mixed
together, which is why I alluded to "a bunch of not-so-obvious
solutions" in the earlier article. Is there a simpler way to get the
answer?

Gareth McCaughan wrote to me to point out a really clever trick that
solves the equation right off. Take the derivative of both sides of
the equation; you immediately get 2ff' +
2f'f'' = 0, or, factoring out f',
f'(f + f'') = 0. So there are two solutions:
either f'=0 and f is a constant function, or f
+ f'' = 0, which even the electrical engineers know how to solve.

David Speyer showed a third solution that seems midway between the two
in the amount of clever trickery required. He rewrote the equation
as:

$${df\over dx} = \sqrt{1
- f^2}$$

$${df\over\sqrt{1
- f^2} } = dx$$

The left side is an old standby of calculus I classes; it's the
derivative of the arcsine function.
On integrating both sides, we have:

$$\arcsin f = x +
C$$

so f = sin(x + C). This is equivalent to the
a0 cos(x) +
a1 sin(x) form that we got before, by an
application of the sum-of-angles formula for the sine function. I
think M. McCaughan's solution is slicker, but M. Speyer's is
the only one that I feel like I should have noticed myself.

Finally, Walt Mankowski wrote to tell me that he had put the question
to Maple, which disgorged the following solution after a few
seconds:

f(x) = 1, f(x) = -1, f(x) = sin(x - _C1), f(x) = -sin(x - _C1).

This is correct, except that the appearance of both sin(x +
C) and -sin(x + C) is a bit odd, since
-sin(x + C) = sin(x + (C + π)). It
seems that Maple wasn't clever enough to notice that. Walt says he
will ask around and see if he can find someone who knows what Maple
did to get the solution here.

I would like to add a pithy and insightful conclusion to this article,
but I've been working on it for more than a week now, and also it's
almost lunch time, so I think I'll have to settle for observing that
sometimes there are a lot of ways to solve math problems.

Most important, I said that "the sync() system call marks all
the kernel buffers as dirty". This is totally wrong, and doesn't
even make sense. Dirty buffers are those with data that needs to be
written out. Marking a non-dirty buffer as dirty is a waste of time,
since nothing has changed in the buffer, but it will now be rewritten
anyway. What sync() does is schedule all the dirty
buffers to be written as soon as possible.

On some recent systems, sync() actually waits for all the
dirty buffers to be written, and a bunch of people tried to correct me
about this. But my original article was right: historically, it was
not so, and even today it's not universally true. In former times,
sync() would schedule the buffers for writing, and then
return before the data was actually written.

I said that one of the duties of init was to call
sync() every thirty seconds, but this was mistaken. That
duty actually fell to a separate program, known as update.
While discussing this with one of the readers who wrote to correct me,
I looked up the source for Version 7 Unix, to make sure I was
right, and it's so short I thought I might as well show it here:

The program is so simple I don't have much more to say about it. It
initially invokes dosync(), which calls sync() and
then schedules another call to dosync() in 30 seconds. Note
that the 0 in the second argument to open had not
yet been changed to O_RDONLY. The pause() call is
equivalent to sleep(0): it causes the process to relinquish
its time slice whenever it is active.

In various systems more recent than V7, the program was known by
various names, but it was update for a very long time.

Several people wrote to correct me about the:

# sync
# sync
# sync
# halt

thing, some saying that I had the reason wrong, or that it did not make
sense, or that only two syncs were used, rather than three.
But I had it right. People did use three, and they did it for the
reason I said, whether that makes sense or not. (Some of the people
who miscorrected me were unaware that sync() would finish and
exit before the data was actually written.) But for example, see this
old Usenet thread for a discussion of the topic that confirms what
I said.

Nobody disputed my contention that Linus was suffering from the
promptings of the Evil One when he tried to change the semantics of
fsync(), and nobody seems to know the proper name of the
false god of false efficiency. I'll give this some thought and see
what I can come up with.

Thanks to Tony Finch, Dmitry Kim, and Stefan O'Rear for discussion of
these points.

the kernel normally copies the data from your buffer into a kernel
buffer, and then, instead of writing out the data to disk, it marks
its buffer as "dirty" (that is, as needing to be written eventually),
and reports success back to the process immediately, even though the
dirty buffer has not yet been written, and the data is not yet on the disk.

Normally, the kernel writes out the dirty buffer in due time,
and the data makes it to the disk, and you are happy because your
process got to go ahead and do some more work without having to wait
for the disk, which could take milliseconds. ("A long time", as I so
quaintly called it in the talk.) If some other process reads the data
before it is written, that is okay, because the kernel can give it the
updated data out of the buffer.

But if there is a catastrophe, say a power failure, then you see the
bad side of this asynchronous writing technique, because the data,
which your process thought had been written, and which the kernel
reported as having been written, has actually been lost.

There are a number of mechanisms in place to deal with this. The
oldest is the sync() system call, which marks all the kernel
buffers as dirty. All Unix systems run a program called
init, and one of init's principal duties is to call
sync() every thirty seconds or so, to make sure that the
kernel buffers get flushed to disk at least every thirty seconds, and
so that no crash will lose more than about thirty seconds' worth of data.

(There is also a command-line program sync which just does a sync()
call and then exits, and old-time Unix sysadmins are in the habit of
halting the system with:

# sync
# sync
# sync
# halt

because the second and third syncs give the kernel time to actually
write out the
buffers that were marked dirty by the first sync. Although I
suspect that few of them know why they do this. I swear I am not
making this up.)

But for really crucial data, sync() is not enough, because, although
it marks the kernel buffers as dirty, it still does not actually
write the data to the disk.

So there is also an fsync() call; I forget when this was
introduced. The process gives fsync() a file descriptor, and
the call demands that the kernel actually write the associated
dirty buffers to disk, and does not return until they have been. And
since,
unlike write(), it actually waits for the data to go to the
disk, a successful return from fsync() indicates that the
data is truly safe.

The mail delivery agent will use this when it is writing your email to
your mailbox, to make sure that no mail is lost.

Some systems have an O_SYNC flag than the process can supply
when it opens the file for writing:

int fd = open("blookus", O_WRONLY | O_SYNC);

This sets the O_SYNC flag in the kernel file pointer
structure, which means that whenever data is
written to this file pointer, the kernel, contrary to its usual
practice, will implicitly fsync() the descriptor.

Well, that's not what I wanted to write about here. What I meant to
discuss was...

No, wait. That is what I wanted to write about. How about that?

Anyway,
there's an interesting question that arises in connection with fsync(): suppose you
fsync() a file. That guarantees that the data will be written. But
does it also guarantee that the mtime and the file extent of the file
will be updated? That is, does it guarantee that the file's inode
will be written?

On most systems, yes. But on some versions of Linux's ext2
filesystem, no. Linus himself broke this as a sacrifice to the false
god of efficiency, a very bad decision in my opinion, for reasons that
should be obvious to everyone but those in the thrall of Mammon.
(Mammon's not right here. What is the proper name of the false god
of efficiency?)

Sanity eventually prevailed. Recent versions of Linux have an
fsync() call, which updates both the data and the inode, and
a fdatasync() call, which only guarantees to update the
data.

Freshman electromagnetism questions
As I haven't quite managed to mention here before, I have occasionally
been sitting in on one of Penn's first-year physics classes, about
electricity and magnetism. I took pretty much the same class myself
during my freshman year of college, so all the material is quite
familiar to me.

But, as I keep saying here, I do not understand physics very well,
and I don't know much about it. And
every time I go to a freshman physics lecture I come out feeling like
I understand it less than I went in.

I've started writing down my questions in class, even though I don't
really have anyone to ask them to. (I don't want to take up the
professor's time, since she presumably has her hands full taking care
of the paying customers.) When I ask people I know who claim to
understand physics, they usually can't give me plausible answers.

Maybe I should mutter something here under my breath about how
mathematicians and mathematics students are expected to have a better
grasp on fundamental matters.

The last time this came up for me I was trying to understand the
phenomenon of dissolving. Specifically, why does it usually happen
that substances usually dissolve faster and more thoroughly in warmer
solutions than in cooler solutions? I asked a whole bunch of people
about this, up to and including a full professor of physical
chemistry, and never got a decent answer.

The most common answer, in fact, was incredibly crappy: "the warm
solution has higher entropy". This is a virtus dormitiva if ever there was one.
There's a scene in a play by Molière in which a candidate for a
medical degree is asked by the examiners why opium puts people to
sleep. His answer, which is applauded by the examiners, is that it
puts people to sleep because it has a virtus dormitiva. That is, a
sleep-producing power. Saying that warm solutions dissolve things
better than cold ones because they have more entropy is not much
better than saying that it is because they have a virtus dormitiva.

The entropy is not a real thing; it is a reification of the power that
warmer substances have to (among other things) dissolve solutes more
effectively than cooler ones. Whether you ascribe a higher entropy to
the the warm solution, or a virtus dissolva to it, comes to the
same thing, and explains nothing. I was somewhat disgusted that I
kept getting this non-answer. (See my explanation of why we put salt on
sidewalks when it snows to see what sort of answer I would have
preferred. Probably there is some equally useless answer one could
have given to that question in terms of entropy.)

(Perhaps my position will seem less crackpottish if I a make an
analogy with the concept of "center of gravity". In mechanics, many
physical properties can be most easily understood in terms of the
center of gravity of some object. For example, the gravitational
effect of small objects far apart from one another can be conveniently
approximated by supposing that all the mass of each object is
concentrated at its center of gravity. A force on an object can be
conveniently treated mathematically as a component acting toward the
center of gravity, which tends to change the object's linear velocity,
and a component acting perpendicular to that, which tends to change
its angular velocity. But nobody ever makes the mistake of supposing
that the center of gravity has any objective reality in the physical
universe. Everyone understands that it is merely a mathematical
fiction. I am considering the possibility that energy should be
understood to be a mathematical fiction in the same sort of way. From
the little I know about physics and physicists,
it seems to me that physicists do not think of energy in this
way. But I am really not sure.)

Anyway, none of this philosophizing is what I was hoping to discuss in
this article. Today I wrote up some of the questions I jotted down in
freshman physics class.

What are the physical interpretations of μ0 and ε0,
the magnetic permeability and electric permittivity of vacuum?
Can these be directly measured? How?

Consider a simple circuit with a battery, a switch, and a
capacitor. When the switch is closed, the battery will suck
electrons out of one plate of the capacitor and pump them into the
other plate, so the capacitor will charge up.

When we open the switch, the current will stop flowing, and the capacitor
will stop charging up.

But why? Suppose the switch is between the capacitor and the
positive terminal of the battery. Then the negative terminal is
still connected to the capacitor even when the switch is open. Why
doesn't the negative terminal of the battery continue to pump
electrons into the capacitor, continuing to charge it up, although
perhaps less than it would be if the switch were closed?

Any beam of light has a time-varying electric field, perpendicular
to the direction that the light is travelling. If I shine a light
on an electron, why doesn't the electron vibrate up and down in the
varying electric field? Or does it?

Suppose I take a beam of polarized light whose electric field is in
the x direction. I split it in two, delay one of the beams by
exactly half a wavelength, and merge it with the other beam. The
electric fields are exactly out of phase and exactly cancel out.
What happens? Where did the light go? What about conservation of
energy?

Suppose I have two beams of light whose wavelengths are close but
not exactly the same, say λ and (λ+dλ). I superimpose these. The
electric fields will interfere, and sometimes will be in phase and
sometimes out of phase. There will be regions where the electric
field varies rapidly from the maximum to almost zero, of length on the order of dλ. If I look at the
beam of light only over one of these brief intervals, it should
look just like very high frequency light of wavelength dλ. But
it doesn't. Or does it?

An electron in a varying magnetic field experiences an
electromotive force. In particular, an electron near a wire that
carries a varying current will move around as the current in the
wire varies.

Now suppose we have one electron A in space near a wire. We will
put a very small current into the wire for a moment; this causes electron A to
move a little bit.

Let's suppose that the current in the wire is as small as it can
be. In fact, let's imagine that the wire is carrying precisely one
electron, which we'll call B. We can calculate the amount of
current we can attribute to the wire just from B. (Current in
amperes is just coulombs per second, and the charge on electron B
is some number of coulombs.) Then we can calculate the force on A
as a result of this minimal current, and the motion of A that
results.

But we could also do the calculation another way ,by forgetting
about the wire, and just saying that electron B is travelling
through space, and exerts an electrostatic force on A, according to
Coulomb's law. We could calculate the motion of A that results from
this electrostatic force.

We ought to get the same answer both ways. But do we?

Suppose we have a beam of light that is travelling along the
x axis, and the electric field is perpendicular to the
x axis, say in the y direction. We learned in
freshman physics how to calculate the vector quantity that
represents the intensity of the electric field at every point on
the x axis; that is, at every point of the form (x,
0, 0). But what is the electric field at (x, 1, 0)? How
does the electric field vary throughout space? Presumably a beam
of light of wavelength λ has a minimum diameter on the order
of λ, but how how does the electric field vary as you move
away from the core? Can you take two such minimum-diameter beams
and overlap them partially?

I did ask #6 to the physics instructor, who is a full professor with a
specialization in high energy theory; she did not know the answer.

Actually I was under so much deadline pressure that I did not
have time to revise the talk. I arrived at the user group meeting
without a certain idea of what talk I was going to give.

Fortunately, the meeting structure is to have a Q&A and discussion
period before the invited speaker gives his talk. The Q&A period
always lasts about an hour. In that hour before I had to speak, I
wrote a new talk called What's a File?. It
mostly concerns the Unix "inode" structure, and what the kernel uses
it for. It uses the output of the well-known ls -l command
as a jumping-off point, since most of the ls -l information
comes from the inode.

Then I talk about how files are opened and permissions are checked,
how the filesystem is organized, how the kernel reads and writes data,
how directories are structured, how it's possible to have one file
with two names, how symbolic links work, and what that mysterious
field is in the ls -l output between the permissions and the
owner.

The talk was quite successful, much more so than I would have
expected, given how quickly I wrote it and my complete inability to
edit or revise it. Of course, it does help that I know this material
backwards and forwards and standing on my head, and also that I could
reuse all the diagrams and illustrations from the 2001 version of the
talk.

I would not, however, recommend this technique.

As my talks have gotten better over the years, I find that less and
less of the talk material is captured in the slides, and so the slides
become less and less representative of the talk itself. But I put
them online anyway,
and here they are.

19th-century elementary arithmetic
In grade school I read a delightful story, by C. A.
Stephens, called The Jonah. In the story, which
takes place in 1867, Grandma and Grandpa are away for the weekend,
leaving the kids alone on the farm. The girls make fried pies for
lunch.

They have a tradition that one or two of the pies are "Jonahs": they
look the same on the outside, but instead of being filled with fruit,
they are filled with something you don't want to eat, in this case a
mixture of bran and cayenne pepper. If you get the Jonah pie, you
must either eat the whole thing, or crawl under the table to be a
footstool for the rest of the meal.

Just as they are about to serve, a stranger knocks at the door. He is
an old friend of Grandpa's. They invite him to lunch, of course
removing the Jonahs from the platter. But he insists that they be put
back, and he gets the Jonah, and crawls under the table, marching it
around the dining room on his back. The ice is broken, and the rest
of the afternoon is filled with laughter and stories.

Later on, when the grandparents return, the kids learn that the
elderly visitor was none other than
Hannibal Hamlin, formerly
Vice-President of the United States.

A few years ago I tried to track this down, and thanks to the Wonders
of the Internet, I was successful. Then this month I had the library
get me some other C. A. Stephens stories, and they were equally
delightful and amusing.

In one of these, the narrator leaves the
pump full of water overnight, and the pipe freezes solid. He then has
to carry water for forty head of cattle, in buckets from the kitchen,
in sub-freezing weather. He does eventually manage to thaw the pipe.
But why did he forget in the first place? Because of fractions:

I had been in a kind of haze all day over two hard examples in complex
fractions at school. One of them I still remember distinctly:

At that point I had to stop reading and calculate the answer,
and I recommend that you do the same.

I got the answer wrong, by the way. I got 25/64 or 64/25 or something of the
sort, which suggests that I flipped over an 8/5 somewhere, because the
correct answer is exactly 1. At first I hoped perhaps there was
some 19th-century precedence convention I was getting wrong, but no,
it was nothing like that. The precedence in this problem is
unambiguous. I just screwed up.

Entirely coincidentally (I was investigating the spelling of the word
"canceling") I also recently downloaded (from Google Books) an
arithmetic text from the same period, The
National Arithmetic, on the Inductive System, by Benjamin
Greenleaf, 1866. Here are a few typical examples:

If 7/8 of a bushel of corn cost 63 cents, what cost a bushel?
What cost 15 bushels?

When 14 7/8 tons of copperas are sold for $500, what is the value
of 1 ton? what is the value of 9 11/12 tons?

If a man by laboring 15 hours a day, in 6 days can perform a
certain piece of work, how many days would it require to do the same
work by laboring 10 hours a day?

Bought 87 3/7 yards of broadcloth for $612; what was the value for
14 7/10 yards?

If a horse eat 19 3/7 bushels of oats in 87 3/7 days, how many
will 7 horses eat in 60 days?

Some of these are rather easy, but others are a long slog. For
example, #1 and #3 here (actually #1 and #25 in the book) can be
solved right off, without paper. But probably very few people have
enough skill at mental arithmetic to carry off $612/(83 3/7) * (14
7/10) in their heads.

The "complex fractions" section, which the original problem would have
fallen under, had it been from the same book, includes problems like
this: "Add 1/9, 2 5/8, 45/(94 7/11), and (47 5/9)/(314 3/5)
together." Such exercises have gone out of style, I think.

In addition to the complicated mechanical examples, there is some good
theory in the book. For example, pages 227–229 concern
continued fraction expansions of rational numbers, as a tool for
calculating simple rational approximations of rationals. Pages
417–423 concern radix-n numerals, with special attention
given to the duodecimal system. A typical problem is "How many square
feet in a floor 48 feet 6 inches long, and 24 feet 3 inches broad?"
The remarkable thing here is that the answer is given in the form
1176 sq. feet. 1' 6'', where the 1' 6'' actually means 1/12 +
6/144 square feet— that is, it is a base-12 "decimal".

I often hear people bemoaning the dumbing-down of the primary and
secondary school mathematics curricula, and usually I laugh at those
people, because (for example) I have read a whole stack of "College
Algebra" books from the early 20th century, which deal in material
that is usually taken care of in 10th and 11th grades now. But I
think these 19th-century arithmetics must form some part of an
argument in the other direction.

On the other hand, those same people often complain that students'
time is wasted by a lot of "new math" nonsense like base-12
arithmetic, and that we should go back to the tried and true methods
of the good old days. I did not have an example in mind when I wrote
this paragraph, but two minutes of Google searching turned up the
following excellent example:

Most forms of life develop random growths which are best pruned off. In plants they are boles and suckerwood. In humans they are warts and tumors. In the educational system they are fashionable and transient theories of education created by a variety of human called, for example, "Professor Of The Teaching Of Mathematics."

When the Russians launched Sputnik these people came to the rescue of our nation; they leapfrogged the Russians by creating and imposing on our children the "New Math."

They had heard something about digital computers using base 2 arithmetic. They didn't know why, but clearly base 10 was old fashioned and base 2 was in. So they converted a large fraction of children's arithmetic education to learning how to calculate with any base number and to switch from base to base. But why, teacher? Because that is the modern way. No one knows how many potential engineers and scientists were permanently turned away by this inanity.

where f' is the derivative of f. This equation has a
couple of obvious solutions (f(x) = 1;
f(x) = sin(x)) and a bunch of not-so-obvious
ones. Since I couldn't solve the equation symbolically, I decided to
fall back on power series. Representing f(x) as
a0 + a1x + a2x2 + ...
one can manipulate the power series and solve for a0, a1,
a2, etc. In fact, this is exactly the application for which mathematicians
first became intersted in power series. The big question is "once you
have found a0, a1, etc., do these values correspond to a real
function? And for what x does the power series expression actually
make sense?" This question, springing from a desire to solve
intractable differential equations, motivates a lot of the
theoretical mathematics of the last hundred and fifty years.

I decided to see if I could use the power series
methods of chapter 6 of Higher-Order Perl to calculate
a0, etc. So far, not yet, although I am getting closer. The
key is that if $series is the series you want, and if you can
calculate at least one term at the front of the series, and then
express the rest of $series in terms of $series, you
win. For example:

# Perl
my $series;
$series = node(1, promise { scale(2, $series) } );

This is perfectly well-defined code; it runs fine and sets
$series to be the series [1,2,4,8,16...]. In Haskell this is
standard operating procedure:

-- Haskell
series = 1 : scale 2 series

But in Perl it's still a bit outré.

Similarly, the book shows, on page 323, how to calculate the
reciprocal of a series s. Any series can be expressed as the
sum of the first term and the rest of the terms:

And when I asked it for the square root of [1,1,0,0,0,...] (that is,
of 1+x) it gave me back [1, 0.5, -0.125, -0.0625, ...], which is
indeed correct.

The Perl code is skankier than I wish it were.
A couple of years ago I
said in an interview that "I wish Perl's syntax were less
verbose." Some people were surprised by this at the time, since Perl
programmers consider Perl's syntax to be quite terse. But comparison
of the Perl and Haskell code above demonstrates the sort of thing I
mean.

Part of ths issue here, of course, is that the lazy list data
structure is built in to Haskell, but I have to do it synthetically in
Perl, and so every construction of a lazy list structure in Perl is
accompanied by a syntactic marker (such as node(...) or
promise { ... }) that is absent, or nearly absent, from the
Haskell.

But when I complained about Perl's verbose syntax in 2005, one thing I
had specifically in mind was Perl's argument acquisition syntax, here
represented by my $s = shift;. Haskell is much terser, with
no loss of expressiveness. Haskell gets another win in the
automatic destructuring bind: instead of explicitly calling
head() and tail() to acquire the values of
s0 and st, as in the Perl code, they are implicitly
called by the pattern match (s0:st) in the Haskell code,
which never mentions s at all. It is quite fair to ascribe
this to a failure of Perl's syntax, since there's no reason in
principle why Perl couldn't support this, at least for built-in data
structures. For example, consider the Perl code:

There are a number of interesting user-interface issues to ask about
here: What if the assigned value is not in the expected form? Are
$a, $b, and $c copied from $href
or are they aliases into it? And so on. One easy way to dispense
with all of these interesting questions (perhaps not in the best way)
is to assert that this notation is just syntactic sugar for the long
version.

I talked to Chip Salzenberg about this at one time, and he said he
thought it would not be very hard to implement. But even if he was right,
what is not very hard for Chip Salzenberg to do can turn out to be
nearly impossible for us mortals.

[ Addendum 20071209: There's a followup article that
shows several different ways of solving the differential equation,
including the power-series method. ]

For people unfamiliar with this concept, I should explain briefly.
The C standard is full of places that say "if the program contains
x, the behavior is undefined", which really means "C programs
do not contain x, so If the
program contains x, it is not written in C, and, as this
standard only defines the meaning of programs in C, it has nothing to
say about the meaning of your program." There are around a couple of
hundred of these phrases, and a larger number of places where it is
implied.

For example, everyone knows that it means when you write x =
4;, but what does it mean if you write 4 = x;?
According to clause 6.3.2.1[#1], it means nothing, and this is not a C
program. The non-guarantee in this case is extremely strong. The C
compiler, upon encountering this locution, is allowed to abort and
spontaneously erase all your files, and in doing so it is not
violating the requirements of the standard, because the standard does
not require any particular behavior in this case.

The memorable phrase that the comp.lang.c folks use is that
using that construction might cause demons to fly out of your nose.

[ Addendum 20071030: I am informed that I misread the standard here,
and that the behavior of this particular line is not undefined, but
requires a compiler diagnostic. Perhaps a better example would have
been x = *(char *)0. ]

Here the pointer p starts at the end of the string s,
and the loop might stop when p points to the position just
before s. Except no, that is forbidden, and the program might
at that moment cause demons to fly out of your nose. You are allowed
to have a pointer that points to the position just after an
object, but not one that points just before.

Well anyway, I seem to have digressed. My point was that M. Gould
says that one advantage of languages like Perl that are defined wholly
by their (one) implementation is that you never have "undefined
behavior". If you want to know what some locution does, you type it
in and see what it does. Poof, instant definition.

Although I think this is a sound point, it occurred to me that that is
not entirely correct. The manual is a specification of sorts, and
even if the implementation does X in situation Y, the
manual might say "The implementation does X in situation
Y, but this is unsupported and may change without warning in
the future." Then what you have is not so different from Y
being undefined behavior. Because the manual is (presumably) a
statement of official policy from the maintainers, and, as a
communiqué from the people with the ultimate authority to
define the future meaning of the language, it has some of the
same status that a formal specification would.

Such disclaimers do appear in the Perl documentation.
Probably the most significant example of this is the static variable
hack. For various implementation reasons, the locution my $static if
0 has a strange and interesting effect:

This makes $static behave as a "static" variable, and persist
from call to call of foo(). Without the ... if 0,
the code would print "static is now 42" five times. But with
... if 0, it prints:

static is now
static is now 1
static is now 2
static is now 3
static is now 4

This was never an intentional feature. It arose accidentally, and
then people discovered it and started using it. Since the behavior
was the result of a strange quirk of the implementation, caused by the
surprising interaction of several internal details, it was officially
decided by the support group that this behavior would not be supported
in future versions. The manual was amended to say that this behavior
was explicitly undefined, and might change in the future. It can be
used in one-off programs, but not in any important program, one that
might have a long life and need to be run under several different
versions of Perl. Programs that use pointers that point outside the
bounds of allocated storage in C are in a similar position. It might
work on today's system, with today's compiler, today, but you can't do
that in any larger context.

Having the "undefined behavior" be determined by the manual, instead
of by a language standard, has its drawbacks. The language standard
is fretted over by experts for months. When the C standard says that
behavior is undefined, it is because someone like Clive Feather or
Doug Gwyn or P.J. Plauger, someone who knows more about C than you
ever will, knows that there is some machine somewhere on which the
behavior is unsupported and unsupportable. When the Perl manual says
that some behavior is undefined, you might be hearing from the Perl
equivalent of Doug Gwyn, someone like Nick Clark or Chip Salzenberg or
Gurusamy Sarathy. Or you might be hearing from a mere nervous-nellie
who got their patch into the manual on a night when the release
manager had stayed up too late.

Here is an example of this that has bothered me for a long time. One
can use the each() operator to loop lazily over the contents
of a hash:

while (my $key = each %hash) {
# do something with $key and $hash{$key}
}

What happens if you modify the hash in the middle of the loop? For
various implementation reasons, the manual forbids this.

For example, suppose the loop code adds a new key to the hash. The
hash might overflow as a result, and this would trigger a
reorganization that would move everything around, destroying the
ordering information. The subsequent calls to each() would
continue from the same element of the hash, but in the new order,
making it likely that the loop would visit some keys more than once,
or some not at all. So the prohibition in that case makes sense:
The each() operator normally guarantees to produce each key
exactly once, and adding elements to a hash in the middle of the loop
might cause that guarantee to be broken in an unpredictable way.
Moreover, there is no obvious way to fix this without potentially
wrecking the performance of hashes.

But the manual also forbids deleting keys inside the loop, and there
the issue does not come up, because in Perl, hashes are never
reorganized as the result of a deletion. The behavior is easily
described: Deleting a key that has already been visited will not
affect the each() loop, and deleting one that has not yet
been visited will just cause it to be skipped when the time comes.

Some people might find this general case confusing, I suppose. But
the following code also runs afoul of the "do not modify a hash
inside of an each loop" prohibition, and I don't think
anyone would find it confusing:

Here we want to delete all the bad items from the hash. We do this by
scanning the hash and deleting the current item whenever it is bad.
Since each key is deleted only after it is scanned by each,
we should expect this to visit every key in the hash, as indeed it
does. And this appears to be a useful thing to write. The only
alternative is to make two passes, constructing a list of bad keys on
the first pass, and deleting them on the second pass. The code would
be more complicated and the time and memory performance would be much
worse.

There is a potential implementation problem, though. The way that
each() works is to take the current item and follow a "next"
pointer from it to find the next item. (I am omitting some
unimportant details here.) But if we have deleted the current item,
the implementation cannot follow the "next" pointer. So what
happens?

In fact, the implementation has always contained a bunch of
code, written by Larry Wall, to ensure that deleting the current key
will work properly, and that it will not spoil the each().
This is nontrivial. When you delete an item, the delete()
operator looks to see if it is the current item of an each()
loop, and if so, it marks the item with a special flag instead of
deleting it. Later on, the next time each() is invoked, it
sees the flag and deletes the item after following the "next"
pointer.

So the implementation takes some pains to make this work. But someone
came along later and forbade all modifications of a hash inside an
each loop, throwing the baby out with the bathwater. Larry
and perl paid a price for this feature, in performance and memory and
code size, and I think it was a feature well bought. But then someone
patched the manual and spoiled the value of the feature. (Some years
later, I patched the manual again to add an exception for this case.
Score!)

Another example is the question of what happens when you modify an
array inside a loop over the array, as with:

@a = (1..3);
for (@a) {
print;
push @a, $_ + 3 if $_ % 2 == 1;
}

(This prints 12346.) The internals are simple, and the semantics are
well-defined by the implementation, and straightforward, but the
manual has the heebie-jeebies about it, and most of the Perl community
is extremely superstitious about this, claiming that it is "entirely
unpredictable". I would like to support this with a quotation from
the manual, but I can't find it in the enormous and disorganized mass
that is the Perl documentation.

[ Addendum: Tom Boutell found it. The perlsyn page says "If
any part of LIST is an array, foreach will get very confused
if you add or remove elements within the loop body, for example with
splice. So don't do that." ]

The behavior, for the record, is quite straightforward: On the first
iteration, the loop processes the first element in the array. On the
second iteration, the loop processes the second element in the array,
whatever that element is at the time the second iteration starts,
whether or not that was the second element before. On the third
iteration, the loop processes the third element in the array, whatever
it is at that moment. And so the loop continues, terminating the
first time it is called upon to process an element that is past the
end of the array. We might imagine the following pseudocode:

There is nothing subtle or difficult about this, and claims that the
behavior is "entirely unpredictable" are probably superstitious
confessions of ignorance and fear.

Let's try to predict the "entirely unpredictable" behavior of the
example above:

@a = (1..3);
for (@a) {
print;
push @a, $_ + 3 if $_ % 2 == 1;
}

Initially the array contains (1, 2, 3), and so the first iteration
processes the first element, which is 1. This prints 1, and, since 1
is odd, pushes 4 onto the end of the array.

The array now contains (1, 2, 3, 4), and the loop processes the second
element, which is 2. 2 is printed. The loop then processes the third
element, printing 3 and pushing 6 onto the end. The array now
contains (1, 2, 3, 4, 6).

On the fourth iteration, the fourth element (4) is printed, and on the
fifth iteration, the fifth element (6) is printed. That is the last
element, so the loop is finished. What was so hard about that?

My blog was recently inserted into the feed for planet.haskell.org, and
of course I immediately started my first streak of posting code-heavy
articles about C and Perl. This is distressing not just because the
articles were off-topic for Planet Haskell—I wouldn't give the
matter two thoughts if I were posting my usual mix of abstract math
and stuff—but it's so off-topic that it feels weird to
see it sitting there on the front page of Planet Haskell. So I
thought I'd make an effort to talk about Haskell, as a friendly
attempt to promote good relations between tribes. I'm not sure what
tribe I'm in, actually, but what the heck. I thought about Haskell a
bit, and a Haskell example came to mind.

Here is a definition of the factorial function in Haskell:

fact 0 = 1
fact n = n * fact (n-1)

I don't need to explain this to anyone, right?

Okay, now here is another definition:

fact 0 = 1
fact (n+1) = (n+1) * fact n

Also fine, and indeed this is legal Haskell. The pattern n+1
is allowed to match an integer that is at least 1, say 7, and doing so binds n to
the value 6. This is by a rather peculiar special case in the
specification of Haskell's pattern-matcher. (It is section 3.17.2#8
of Haskell 98 Language and Libraries: The Revised
Report, should you want to look it up.) This peculiar
special case is known sometimes as a "successor pattern" but more
often as an "n+k pattern".

The spec explicitly deprecates this feature:

Many people feel that n+k patterns should not be
used. These patterns may be removed or changed in future versions of
Haskell.

(Page 33.) One wonders why they put it in at all, if they were going
to go ahead and tell you not to use it. The Haskell committee is
usually smarter than this.

I have a vague recollection that there was an argument between people
who wanted to use Haskell as a language for teaching undergraduate
programming, and those who didn't care about that, and that this was
the compromise result. Like many compromises, it is inferior to both
of the alternatives that it interpolates between. Putting the feature
in complicates the syntax and the semantics of the language, disrupts
its conceptual purity, and bloats the
spec—see the Perlesque yikkity-yak on pages 57–58 about
how x + 1 = ... binds a meaning to +, but (x +
1) = ... binds a meaning to x. Such complication is
worth while only if there is a corresponding payoff in terms of
increased functionality and usability in the language. In this case,
the payoff is a feature that can only be used in one-off programs.
Serious programs must avoid it, since the patterns "may be removed or
changed in future versions of Haskell". The Haskell committee
purchased this feature at a certain cost, and it is debatable whether
they got their money's worth. I'm not sure which side of that issue I
fall on. But having purchased the feature, the committee then threw
it in the garbage, squandering their sunk costs. Oh well. Not even
the Haskell committee is perfect.

I think it might be worth pointing out that the version of the program
with the n+k pattern is technically superior to the
other version. Given a negative integer argument, the first version
recurses forever, possibly taking a long time to fail and perhaps
taking out the rest of the system on which it is running. But the
n+k version fails immediately, because the n+1
pattern will only match an integer that is at least 1.

The "nasal demons" of the C standard are a joke, but a serious one.
The C standard defines what C compilers must do when presented with C
programs; it does not define what they do when presented with
other inputs, nor what other software does when presented with C
programs. The authors of C standard clearly understood the standard's
role in the world.

XML documents may, and should, begin with an XML declaration which
specifies the version of XML being used. For example, the following is
a complete XML document, well-formed but not valid:

<?xml version="1.0"?>
<greeting>Hello, world!</greeting>

...

The version number "1.0" should be used to indicate conformance to
this version of this specification; it is an error for a document to
use the value "1.0" if it does not conform to this version of this
specification.

(Emphasis is mine.) The XML 1.0 spec is just a document. It has no power,
except to declare that certain files are XML 1.0 and certain files are
not. A file that complies with the requirements of the spec is XML 1.0;
all other files are not XML 1.0. But in the emphasized clause, the spec
says that certain behavior "is an error" if it is exhibited by
documents that do not conform to the spec. That is, it is
declaring certain non-XML-1.0 documents "erroneous". But within the
meaning of the spec, "erroneous" simply means that the documents are
not XML 1.0. So the clause is completely redundant. Documents that do
not conform to the spec are erroneous by definition, whether or not
they use the value "1.0".

It's as if the Catholic Church issued an edict forbidding all rabbis
from wearing cassocks, on pain of excommunication.

I am happy to discover that this dumb error has been removed from the
most recent edition of the XML 1.0 spec.

Where's that blog?
I haven't posted in a couple of weeks, and I was wondering why. So I
took a look at the test version of the blog, which displays all the
unpublished articles as well as the published ones, and the reason
was obvious: In the past ten days I've written seven articles that are
unfinished or that didn't work. Usually only about a third of my
articles flop; this month a whole bunch flopped in a row. What can I
say? Sometimes the muse delivers, and sometimes she doesn't.

I said a while back that I would try to publish more regularly, and
not wait until every article was perfect. But I don't want to publish
the unfinished articles yet. So I thought instead I'd publish a
short summary of what I've been thinking about lately.

I hope to get at least one or two of these done by the end of the
month.

Simplified Poker

I recently played a computer poker game that uses a 24-card deck, with
only the nine through ace of each suit. This changes the game
drastically. For example, a flush is less likely than a four of a
kind. (The game uses the standard hand rankings anyway.) It is very
easy to compute optimal strategies for this game, because there are so
few possible hands (42,504) that you can brute-force all the
calculations with a computer.

This got me thinking again of something I started writing up last year
and never finished: The game of "Simplified Poker", which was an
attempt to do for Poker what the λ-calculus does for
computation: the simplest possible model that nevertheless captures
all the essential features of the original. Simplified Poker is
played with an infinite deck in which half the cards are kings and
half are jacks. Each hand contains only two cards. Nevertheless,
bluffing is still possible.

The Annoying Boxes Puzzle

This is a logic puzzle in which you deduce which box contains the
treasure, but with a twist. I thought it up many years ago, and then
in the course of trying to write up an explanation about five years
ago, I consulted Raymond Smullyan's book What is the Name of
This Book? in order to get a citation to prove a certain fact
about the form that such puzzles usually take. In doing so, I
discovered that Smullyan actually presented the annoying boxes puzzle
(in slightly different form) in that book!

It's primarily waiting for me to take a photograph to accompany the
puzzle.

Undefined behavior

I have a pretty interesting article on the concept of "undefined
behavior", which is a big deal in the C world, but which means
something rather different, and is much less important, in Perl.

Tootle

My daughter Iris has become interested in the book
Tootle, by Gertrude Crampton, which is the
third-best-selling hardback children's book of all time. A few years
back I wrote some brief literary criticism of Tootle,
which I included when I wrote the Wikipedia article about the book.
This criticism was quite rightly deleted later on, as uncited original
research. It needs a new home, and that home is obviously here.

Periodicity without Fourier Series

Suppose I have tabulated the number of blog posts I made every day
for two years. I want to find if there is any discernible periodicity
to this data. Do I tend to post in 26-day cycles, for example?

One way to do this is to take the Fourier transform of the data. For
various reasons, I don't like this technique, and I'm trying to
invent something new. I think I have what I want, although it took
several tries to find it. Unfortunately, the blog posting data shows
no periodicity whatsoever.

Emacs and auto-mode-alist

The elisp code I've been using for the past fifteen years to set the
default mode for Perl editing in Emacs broke last week. My search for
a replacement turned up some very bizarre advice on IRC.

Van der Waerden's problem

Also still pending is the rest of my van der Waerden problem series.
I have written about four programs so far, and I have two to go.

Van der Waerden's problem: programs 3 and 4
In this series of articles I'm analyzing five versions of a
program that I wrote around 1988, and then another program that does
the same thing that I wrote last month without referring to the 1988
code. (I said before that it was four versions, but apparently I'm not so
good at counting to five.)

Program 3

The problem is all in the implementation.
You see, this program actually constructs the entire tree in
memory.

Somewhere along the line it dawned on me that constructing the tree
was unnecessary, so I took that machinery out, and the result was
version 3.

Consequently, this program is easy to explain once you have seen the
previous version: almost all I have to do is list the stuff that I
took out.

Since this program does not construct a tree of node structures, it
omits the definition of the node structure and the macro for
manufacturing nodes. Since it gets rid of the node allocation, it
also gets rid of the memory leak of the previous version, and so omits
the customized memory allocation functions Malloc and
Free that performed memory tracking.

The previous program had a compiled-in limit on the number of colors
it would handle, because at the time I didn't know how to do a dynamic
array. In this program, I got rid of the node structures, so there
was no array of node structures, so no need for a limit on the number
of node structures in the array. And all the code that enforced the
limit is gone.

The apchk function, which checks to see if a string is good,
remains unchanged from the previous version.

The makenodes function, which was the principal function in
the previous program, remains, but has lost a lot of code. It is
simpler to call, too; the node argument is gone:

makenodes(maxlen,"");

I got rid of the silly !howfar test in favor of a more
easily-understood howfar == 0 test. There are lots of times
when ! is appropriate, but testing whether a non-negative
integer has reached zero is not one of them. I was going to comment
earlier about what a novice error this is, and I'm glad to see that I
fixed it.

The main use of apchk in the previous program had if
(!apchk(...)) { ... }. That was okay, because apchk
returns a Boolean result. But the negation is annoying. It suggests
that apchk's return value is backward. (Instead of returning
true for a bad string, it should return true for a good string.) This
is not very much a big deal, and I only brought it up so that I could
diffidently confess that these days I would probably have done:

#define unless(c) if(!(c))
...
unless (is_bad(...)) {
}

There are a lot of stories of doofus Pascal programmers who do:

#define begin {
#define end }

and Fortran programmers who do:

#define GT >
#define GE >=
#define LT <
#define LE <=

and I find, to my shame, that I have become one of them. Anyone
seeing #define unless(c) if(!(c)) would snort and say "Oh,
this was obviously written by a Perl programmer."

But at least I was a C programmer first.

Actually I was a Fortran programmer first. But I was never a big
enough doofus to #define GE >=.

The big flaw in the current program is the string argument to
makenodes. Each call to makenodes copies this
string so that it can append a character to the end. I discussed this
at some length in the previous article, so I don't want to make too
much of it now; I'll just say that a better technique would have
reused the string buffer from call to call. This obviously saves a
little memory, and since most of the contents of the string doesn't
change, it also saves a lot of time.

This might be worth seeing, since it seems to me now to be a marvel of
wasted code:

The repeated strlen, for example, when ls could be
calculated as maxlen - howfar. The excessively verbose
failure message, which should be inside the STRING macro
anyway. (The code that maintains gotten has gone away with
the debugging allocation routines, so the second fprintf is
superfluous.) And why did I think abort was the right
thing to call on an out-of-memory condition?

Oh well, you live and learn.

Program 4

The fourth version of the program is even more trimmed-down. In this
version of the program I did get the idea to reuse the string
buffer instead of copying the string on every recursive call. But I
also got an even better idea, and eliminated the recursive call. The
makenodes function is now down to one argument, which tells
it how deep a tree to search.

This is a better job all around, and not very different from what I
wrote last month to do the same thing. I was going to title this
series of articles "I have become a better programmer!", and now that
I see this version, I'm glad I didn't, because there's no evidence
here that I am much better. This version of the program gets a solid
A from my older self.

The value depth scans forward in the string when the search
is going well, and is decremented again when the search needs to
backtrack. If depth == maxdepth, a witness of the desired
length has been found, and is printed out.

The curlet ("current letter") variable tracks which branch of
the current tree node we are "recursing" down. After the function
recurses down, by incrementing depth, curlet is set
to 'A' to visit the first sub-node of the new current node.
The curstring buffer tracks the path through the tree to the
current node. When the function needs to backtrack, it restores the
state of curlet from the last character in the buffer and
then trims that character off the end of the path.

I'd only want to make two changes to this code. One would be to make
depth a pointer into the curstring buffer instead of
an index into it. Then again, the compiler may well have optimized it
into one anyway. But it would also allow me to eliminate
curlet in favor of just using *depth everywhere.

The other change would address a more serious defect: the contents of
curstring are kept properly zero-terminated at all times,
whenever depth is advanced or retracted. This
zero-termination is unnecessary, since curstring is never
used as a string except when depth == maxdepth. When
printfing curstring, I could have used something
like:

printf("%.*s%c\n",curstring,maxlen,curlet);

which prints exactly maxlen characters from the buffer,
regardless of whether it is zero-terminated.

It would, however, have required that I know about %.*s,
which I'm sure I did not. Was %.*s even available in 1988?
I forget, and my copy of K&R First Edition is in a box somewhere
since my recent move. Anyway, if %.*s was unavailable for
whatever reason, the code could have had a single
curstring[maxdepth] = 0 up front, which would have been quite
sufficient for the one printf it needed to do.

Coming next: one very different program to solve the same problem, and
a comparison with last month's effort.

The idea is that when you calculate derived data in a database, such
as a view or a selection, you can simultaneously calculate exactly
which input tuples contributed to each output tuple's presence in the
output. Each input tuple is annotated with an identifier that says
who was responsible for putting it there, and the output annotations
are polynomials in these identifiers. (The complete paper is
here.)

A simple example may make this a bit clearer. Suppose we have the
following table R:

R

a

a

a

b

a

c

b

c

c

e

d

e

We'll write R(p, q) when the tuple (p,
q) appears in this table. Now consider the join of R
with itself. That is, consider the relation S where
S(x, z) is true whenever both R(x,
y) and R(y, z) are true:

S

a

a

a

b

a

c

a

e

b

e

Now suppose you discover that the R(a, b)
information is untrustworthy. What tuples of S are
untrustworthy?

If you annotate the tuples of R with identifiers like this:

R

a

a

u

a

b

v

a

c

w

b

c

x

c

e

y

d

e

z

then the algorithm in the paper calculates polynomials for the tuples
of S like this:

S

a

a

u2

a

b

uv

a

c

uw + xv

a

e

wy

b

e

xy

If you decide that R(a, b) is no good, you assign
the value 0 to v, which reduces the S table to:

S

a

a

u2

a

b

0

a

c

uw

a

e

wy

b

e

xy

So we see that tuple S(a, b) is no good any more,
but S(a, c) is still okay, because it can be
derived from u and w, which we still trust.

This assignment of polynomials generalizes a lot of earlier work on
tuple annotation. For example, suppose each tuple in R is annotated with a
probability of being correct. You can propagate the probabilities
to S just by substituting the appropriate numbers for the
variables in the polynomials. Or suppose each tuple in R might
appear multiple times and is annotated with the number of times it
appears. Then ditto.

If your queries are recursive, then the polynomials might be
infinite. For example, suppose you are calculating the transitive
closure T of relation R. This is like the previous
example, except that instead of having
S(x, z) = R(x, y) and
R(y, z), we have
T(x, z) = R(x, z) or
(T(x, y) and R(y, z)). This
is a recursive equation, so we need to do a fixpoint solution for it,
using certain well-known techniques. The result in this example is:

T

a

a

u+

a

b

u*v

a

c

u*(vx+w)

a

e

u*(vx+w)y

b

c

x

b

e

xy

d

e

z

In such a case there might be an infinite number of paths through
R to derive the provenance of a certain tuple of T. In
this example,
R contains a loop, namely R(a, a), so there are an
infinite number of derivations of some of the tuples in T, because you can go around the loop as
many times as you like.
u+ here is an abbreviation for the infinite polynomial
u + u2 + u3 + ...;
u* here is an abbreviation for
1 + u+.

1

a

2

(a + b)

3

((a + b) + c)

(a + (b + c))

4

(((a + b) + c) + d)

((a + (b + c)) + d)

((a + b) + (c + d))

(a + ((b + c) + d))

(a + (b + (c + d)))

5

((((a + b) + c) + d) + e)

(((a + (b + c)) + d) + e)

(((a + b) + (c + d)) + e)

(((a + b) + c) + (d + e))

((a + ((b + c) + d)) + e)

((a + (b + (c + d))) + e)

((a + (b + c)) + (d + e))

((a + b) + ((c + d) + e))

((a + b) + (c + (d + e)))

(a + (((b + c) + d) + e))

(a + ((b + (c + d)) + e))

(a + ((b + c) + (d + e)))

(a + (b + ((c + d) + e)))

(a + (b + (c + (d + e))))

In one example in the paper, the method produces a recursive relation
of the form
V = s + V2, which can be solved by
the same well-known techniques to come up with an (infinite) polynomial for
V, namely V = 1 + s
+ 2s2
+ 5s3
+ 14s4
+ ... . Mathematicians will recognize the sequence 1, 1, 2, 5, 14,
... as the Catalan
numbers, which come up almost as often as the better-known
Fibonacci numbers. For example, the Catalan numbers count the number
of binary trees with n nodes; they also count the number of
ways of parenthesizing an expression with n terms, as shown in
the table at right.

Anyway, in his talk, Val referred to the sequence as "bizarre", and
I had to jump in to point out that it was not at all bizarre, it was
the Catalan numbers, which are just what you would expect from a
relation like V = s + V2, blah blah,
and he cut me off, because of course he knows all about the Catalan
numbers. He only called them bizarre as a rhetorical flourish, meant
to echo the presumed puzzlement of the undergraduates in the room.

(I
never know how much of what kind of math to expect from computer
science professors. Sometimes they know things I don't expect at all,
and sometimes they don't know things that I expect everyone to know.

(This was indeed what was going on, and the professor
seemed to think it was a surprising insight. I am not relating this
boastfully, because I truly don't think it was a particularly inspired
guess.

(Now that I think about it, maybe the answer here is that
computer science professors know more about math than I expect, and
less about computation.)

Anyway, I digress, and the whole article up to now was not really what
I wanted to discuss anyway. What I wanted to discuss was that when I
started blathering about Catalan numbers, Val said that if I knew so
much about Catalan numbers, I should calculate the coefficient of the
x59 term in V2, which also
appeared as one of the annotations in his example.

So that's the puzzle, what is the coefficient of the x59 term in
V2, where
V = 1 + s
+ 2s2
+ 5s3
+ 14s4
+ ... ?

After I had thought about this for a couple of minutes, I realized
that it was going to be much simpler than it first appeared, for two reasons.

The first thing that occurred to me was that the definition of
multiplication of polynomials is that the coefficient of the
xn term in the product of A and
B is
Σaibn-i.
When A=B, this reduces to
Σaian-i.
Now, it just so happens that the Catalan numbers obey the relation
cn+1 = Σ
cicn-i, which
is exactly the same form.
Since the coefficients of
V are the ci, the coefficients of
V2 are going to have the form
Σcicn-i,
which is just the
Catalan numbers again, but shifted up by one
place.

The next thing I thought was that the Catalan numbers have a pretty
simple generating function f(x). This just means that
you pretend that the sequence V is a Taylor series, and figure
out what function it is the Taylor series of, and use that as a
shorthand for the whole series, ignoring all questions of convergence
and other such analytic fusspottery. If V is the Taylor series for
f(x), then
V2 is the Taylor series for
f(x)2. And if f has a compact
representation, say as sin(x) or something, it might be much
easier to square than the original V was. Since I knew in this
case that the generating function is simple, this seemed
likely to win.
In fact the generating function
of V is not sin(x) but
(1-√(1-4x))/2x.
When you square
this, you get almost the same thing back, which matches my prediction
from the previous paragraph. This would have given me the right
answer, but before I actually finished that
calculation, I had an "oho" moment.

The generating function is known to satisfy the relation
f(x) = 1 + xf(x)2.
This relation is where the (1-√(1-4x))/2x thing comes from
in the first place; it is the function that satisfies that relation.
(You
can see this relation prefigured in the equation that Val had, with
V = s + V2. There the notation is a
bit different, though.)
We can just rearrange the terms here, putting the
f(x)2 by itself, and get
f(x)2 = (f(x)-1)/x.

So the answer is that the coefficient of the x59
term is just c(60), calculation of which is left as an exercise
for the reader.

I don't know what the point of all that was, but I thought it was fun
how the hairy-looking problem seemed likely to be simple when I looked at
it a little more carefully, and then how it did turn out to be
quite simple.

This blog has had a recurring dialogue between subtle technique and
the sawed-off shotgun method, and I often favor the sawed-off shotgun
method. Often programmers' big problem is that they are very clever
and learned, and so they want to be clever and learned all the time,
even when being a knucklehead would work better. But I think this
example provides some balance, because it shows a big win for the
clever, learned method, which does produce a lot more
understanding.

Relatively prime polynomials over Z2
Last week Wikipedia was having a discussion on whether the subject of
"mathematical
quilting" was notable enough to deserve an article. I remembered
that there had been a mathematical quilt on the cover of some journal
I read last year, and I went to the Penn math library to try to find
it again. While I was there, I discovered that the June 2007 issue of
Mathematics Magazine had a cover story about the
probability that two randomly-selected polynomials over Z2 are relatively prime. ("The
Probability of Relatively Prime Polynomials", Arthur T. Benjamin and
Curtis D Bennett, page 196).

Polynomials over Z2 are one of my
favorite subjects, and the answer to the question turned out to be
beautiful. So I thought I'd write about it here.

First, what does it mean for two polynomials to be relatively prime? It's
analogous to the corresponding definition for integers. For any
numbers a and b, there is always some number d
such that both a and b are multiples of d.
(d = 1 is always a solution.) The greatest such number is called
the greatest common divisor or GCD of a and
b. The GCD of two numbers might be 1, or it might be some
larger number. If it's 1, we say that the two numbers are relatively prime
(to each other). For example, the GCD of 100 and 28 is 4, so 100 and
28 are not relatively prime. But the GCD of 100 and 27 is 1, so 100 and 27 are
relatively prime. One can prove theorems like these: If p is prime, then either
a is a multiple of p, or a is relatively prime to p,
but not both. And the equation ap + bq = 1 has a
solution (in integers) if and only if p and q are relatively prime.

The definition for polynomials is just the same. Take two polynomials
over some variable x, say p and q. There is some
polynomial d such that both p and q are multiples
of d; d(x) = 1 is one such. When the only
solutions are trivial polynomials like 1, we say that the polynomials
are relatively prime. For example, consider x2 + 2x + 1 and x2 - 1.
Both are multiples of x+1, so they are not relatively prime. But x2 +
2x + 1 is relatively prime to x2 - 2x + 1. And one can
prove theorems that are analogous to the ones that work in the
integers. The analog of "prime integer" is "irreducible polynomial".
If p is irreducible, then either a is a multiple of
p, or a is relatively prime to p, but not both. And the
equation a(x)p(x) +
b(x)q(x) = 1 has a solution for
polynomials a and b if and only if p and q
are relatively prime.

One uses Euclid's algorithm to calculate the GCD of two integers.
Euclid's algorithm is simple: To calculate the GCD of a
and b, just subtract the smaller from the larger, repeatedly,
until one of the numbers becomes 0. Then the other is the GCD.
One can use an entirely analogous algorithm to calculate the GCD of two
polynomials. Two polynomials are relatively prime just when their GCD, as
calculated by Euclid's algorithm, has degree 0.

Anyway, that was more introduction than I wanted to give. The article
in Mathematics Magazine concerned polynomials over Z2,
which means that the coefficients are in the field Z2, which is just
like the regular integers, except that 1+1=0. As I explained in the earlier article, this implies
that a=-a for all a, so there are no negatives
and subtraction is the same as addition. I like this field a lot,
because subtraction blows. Do you have trouble because you're always
dropping minus signs here and there? You'll like Z2; there are no
minus signs.

Here is a table that shows which pairs of polynomials over Z2 are
relatively prime. If you read this blog through some crappy aggregator, you are
really missing out, because the table is awesome, and you can't see it
properly. Check out
the real thing.

a0

a1

a2

a3

a4

a5

a6

a7

a8

a9

b0

b1

b2

b3

b4

b5

b6

b7

b8

b9

c0

c1

c2

c3

c4

c5

c6

c7

c8

c9

d0

d1

0

[a0]

1

[a1]

x

[a2]

x + 1

[a3]

x2

[a4]

x2 + 1

[a5]

x2 + x

[a6]

x2 + x + 1

[a7]

x3

[a8]

x3 + 1

[a9]

x3 + x

[b0]

x3 + x + 1

[b1]

x3 + x2

[b2]

x3 + x2 + 1

[b3]

x3 + x2 + x

[b4]

x3 + x2 + x + 1

[b5]

x4

[b6]

x4 + 1

[b7]

x4 + x

[b8]

x4 + x + 1

[b9]

x4 + x2

[c0]

x4 + x2 + 1

[c1]

x4 + x2 + x

[c2]

x4 + x2 + x + 1

[c3]

x4 + x3

[c4]

x4 + x3 + 1

[c5]

x4 + x3 + x

[c6]

x4 + x3 + x + 1

[c7]

x4 + x3 + x2

[c8]

x4 + x3 + x2 + 1

[c9]

x4 + x3 + x2 + x

[d0]

x4 + x3 + x2 + x + 1

[d1]

A pink square means that the polynomials are relatively prime; a white square
means that they are not.
Another version of this table appeared on the cover of
Mathematics Magazine. It's shown at right.

The thin black lines in the diagram above divide the polynomials of
different degrees. Suppose you pick two degrees, say 2 and 2, and look at
the corresponding black box in the diagram:

a4

a5

a6

a7

x2

[a4]

x2 + 1

[a5]

x2 + x

[a6]

x2 + x + 1

[a7]

You will see that each box contains exactly half pink and half white
squares. (8 pink and 8 white in that case.) That is, exactly half
the possible pairs of degree-2 polynomials are relatively prime. And in general,
if you pick a random degree-a polynomial and a random
degree-b polynomial, where a and b are not both
zero, the polynomials will be relatively prime exactly half the time.

The proof of this is delightful. If you run Euclid's algorithm
on two relatively prime polynomials over Z2, you get a series of intermediate results,
terminating in the constant 1. Given the intermediate results and the
number of steps, you can run the algorithm backward and find the
original polynomials. If you run the algorithm backward starting from
0 instead of from 1, for the same number of steps, you get two
non-relatively-prime polynomials of the same degrees instead. This
establishes a one-to-one correspondence between pairs of
relatively prime polynomials and pairs of non-relatively-prime polynomials of the
same degrees. End of proof. (See the
paper for complete details.)

You can use basically the same proof to show that the probability that
two randomly-selected polynomials over Zp is
1-1/p. The argument is the same: Euclid's algorithm could
produce a series of intermediate results terminating in 0, in which
case the polynomials are not relatively prime, or it could produce the same series
of intermediate results terminating in something else, in which case
they are relatively prime. The paper comes to an analogous conclusion about monic
polynomials over Z.

Some folks I showed the diagram to observed that it
looks like a quilt pattern. My wife did actually make a quilt that
tabulates the GCD function for integers, which I mentioned in the
Wikipedia discussion of the notability of the Mathematical Quilting
article. That seems to have brought us back to where the article
started, so I'll end here.

[ Puzzle: The (11,12) white squares in the picture are connected to
the others via row and column 13, which doesn't appear. Suppose the
quilt were extended to cover the entire quarter-infinite plane. Would
the white area be connected? ]

Van der Waerden's problem: program 2
In this series of articles I'm going to analyze four versions of a
program that I wrote around 1988, and then another program that does
the same thing that I wrote last month without referring to the 1988
code.

Program 2

I can't remember whether I expected this to be practical at the time.
Did I really think it would work? Well, there was some sense to it.
It does work just fine for the 29 case. I think probably
my idea was to do the simplest thing that could possibly work, and get
as much information out of it as I could. On my current machine, this
method proves that V(3,3) > 19 by finding a witness
(RRBRRBBYYRRBRRBBYYB) in under 10 seconds. If we
estimate that the computer I had then was 10,000 times slower, then I
could have produced the same result in about 28 hours. I was at
college, and there was plenty of free computing power available, so
running a program for 28 hours was easily done. While I was waiting
for it to finish, I could work on a better program.

The idea behind this program is that the strings of length less than
V form a tree, with the empty string as the root, and the
children of string s are obtained from s by appending a
single character to the end of s. If the string at a node is
bad, so will be all the strings under it, and we can prune the entire
branch at that node. This leaves us with a tree of all the good
strings. The ones farthest from the root will be the witnesses we
seek for the values of V(n, C), and we can find
these by doing depth-first search on the tree,

There is nothing wrong with this idea in principle; that's the way my
current program works too. The problem is all in the implementation.
You see, this program actually constructs the entire tree in
memory:

struct tree is a tree node structure. It represents a
string s, and has a flag to record whether s is bad. It
also has pointers to its subnodes, which will represents strings
sA,
sB,
and so on.

MAXCOLORS is a compiled-in limit on the number of different
symbols the strings can contain, an upper bound on C.
Apparently I didn't know the standard technique for avoiding this
inflexibility. You declare the array as having length 1, but then
when you allocate the structure, you allocate enough space for the
array you are actually planning to use. Even though the
declared size of the array is 1, you are allowed to refer to
node->away[37] as long as there is actually enough
space in the allocated chunk. The implementation would look like this:

(Note for those who are not advanced C programmers: I give you my
solemn word of honor that I am not doing anything dodgy or bizarre
here; it is a standard, widely-used, supported technique, guaranteed
to work everywhere.)

(As before, this code is in a pink box to indicate that it is not
actually part
of the program I am discussing.)

Another thing I notice is that the NEWN macro is very weird.
Note that it may not work as expected in a context like this:

for(i=0; i<10; i++)
s[i] = NEWN;

This allocates ten nodes but prints only one star, because it expands
to:

and the for loop does not control the printf. The
usual fix for multiline macros like this is to wrap them in
do...while(0), but that is not appropriate here.
Had I been writing this today, I would have made NEWN a
function, not a macro. Clevermacroitis is a common disorder of
beginning C programmers, and I was no exception.

The main business of the program is in the makenodes
function; the main routine does some argument processing and then
calls makenodes. The arguments to the makenodes
function are the current tree node, the current string that that node
represents, and an integer howfar that says how deep a tree
to construct under the current node.

Recall that apchk checks a string for an arithmetic
progression of equal characters. That is, it checks to see if a
string is good or bad. If the string is bad, the function prunes the
tree at the current node, and doesn't recurse further.

Unlike the one in the previous program, this apchk doesn't
bother checking all the possible arithmetic progressions. It only
checks the new ones: that is, the ones involving the last character.
That's why it has two arguments. One is the old string s and
the other is the new symbol that we want to append to s.

If s would still be good with symbol 'A'+i appended
to the end, the function recurses:

The entire string is copied here into a new buffer. A better
technique sould have been to allocate a single buffer back up in
main, and to reuse that buffer over again on each call to
makenodes. It would have looked something like this:

This would have saved a lot of consing, ahem, I mean a lot of
mallocing. Also a lot of string copying. We could avoid the
end pointer by using start+maxlen-howfar instead,
but this way is easier to understand.

I was thinking this afternoon how it's intersting the way I wrote
this. It's written the way it would have been done, had I been using
a functional programming language. In a functional language, you
would never mutate the same string for each function call; you always
copy the old structure and construct a new one, just as I did in this
program. This is why C programmers abominate functional languages.

Had I been writing makenodes today, I would probably have
eliminated the other argument. Instead of passing it a node and
having it fill in the children, I would have had it construct and
return a complete node. The recursive call would then have looked
like this:

The interesting thing here is the
TABS+12-maxlen+howfar
argument, which indents the display depending on how far the recursion
has progressed. In Perl, which has nonaddressable strings, I usually
do something like this:

my $TABS = " " x (maxlen - howfar);
print $TABS, "....";

The TABS trick here is pretty clever, and I'm a bit surprised
that I thought of it in 1988, when I had been programming in C for
only about a year. It makes an interesting contrast to my failure to
reuse the string buffer in makenodes earlier.

(Peeking ahead, I see that in the next version of the program, I
did reuse the string buffer in this way.)

TABS is actually forty spaces, not tabs. I suspect I used
tabs when I tested it with V(2, 3), where maxlen was
only 9, and then changed it to spaces for calculating V(3, 3),
where maxlen was 27.

The apchk function checks to see if a string is good.
Actually it gets a string, qq, and a character, q,
and checks to see if the concatenation of qq and q
would be good. This reduces its running time to
O(|qq|) rather than
O(|qq|2).

It's funny that it didn't occur to me to include an extra parameter to
avoid the strlen, or to use q instead of
qq[s] in the first == test. Also, as in the previous
program, I seem unaware of the relative precedences of
&& and ==. This is probably a hangover from
my experience with Pascal, where the parentheses are required.

It seems I hadn't learned yet that predicate functions like
apchk should be named something like is_bad, so that
you can understand code like if (is_bad(s)) { ... } without
having to study the code of is_bad to figure out what it
returns.

I was going to write that I hated this function, and that I could do
it a lot better now. But then I tried to replace it, and wasn't as
successful as I expected I would be. My replacement was:

I could simplify the initializations of f and s,
which are the parts I dislike most here, by making the pointers move
backward instead of forward, but then the termination test becomes
more complicated:

Anyway, I thought I could improve it, but I'm not sure I did. On the
one hand, I like the
f -= 2; s -= 1;, which I think is pretty clear. On the other
hand, s = (f + t) / 2 is pretty clear too; s is
midway between f and t. I'm willing to give
teenage Dominus a passing grade on this one.

Someone probably wants to replace the while loop here with a
for loop. That person is not me.

The Malloc and Free functions track memory usage and
were presumably introduced when I discovered that my program used up
way too much memory and crashed—I think I remember that the
original version omitted the calls to free. They aren't
particularly noteworthy, except perhaps for this bit, in Malloc:

It strikes me as odd that I was using void in 1988 (this is
before the C90 standard) but still K&R-style function
declarations. I don't know what to make of that.

Behavior

This program works, almost. On my current machine, it can find the
length-26 witnesses for V(3, 3) in no time. (In 1998, it took
several days to run on a Sequent
Balance 21000.) The major problem is that it gobbles memory: the
if (!howfar) base case in makenodes forgets to
release the memory that was allocated for the new node. I wonder if
the Malloc and Free functions were written in an
unsuccessful attempt to track this down.

Sometime after I wrote this program, while I was waiting for it to
complete, it occurred to me that it never actually used the tree for
anything, and I could take it out.

I have this idea that one of the principal symptoms of novice
programmers is that they take the data structures too literally, and
always want to represent data the way it will appear when it's printed
out. I haven't developed the idea well enough to write an article
about it, but I hope it will show up here sometime in the next three
years. This program, which constructs an entirely unnecessary tree
structure, may be one of the examples of this idea.

Well, clearly you can do four: ••••. And then you can add
another red one on the end: •••••. And then another that
could be either red or blue: ••••••. And then the next
can be either color, say blue: •••••••.

This is the sort of thing that you can get away with on a one-time
basis, but which you wouldn't want to make a convention of. Since the
purpose of the macro processor is to enable such hacks for the
duration of a single article, it's all good.

I did run into at least one problem: I was writing an article in which
I had defined ^i to abbreviate
<sup><i>i</i></sup>. And then
several paragraphs later I had a TeX formula that contained the
^i sequence in its TeX meaning. This was being replaced with
a bunch of HTML, which was then passed to TeX, which then produced the
wrong output.

One can solve this by reordering the plugins. If I had put the
TeX plugin before the macro plugin, the problem would have gone away,
because the TeX plugin would have replaced the TeX formula with an
image element before the macro plugin ever saw the ^i.

This approach has many drawbacks.
One is that it would no longer have been possible to use Blosxom
macros in a TeX formula. I wasn't willing to foreclose this
possibility, and I also wasn't sure that I hadn't done it somewhere.
If I had, the TeX formula that depended on the macro expansion would
have broken. And this is a risk whenever you move the macro plugin:
if you move it from before plugin X to after plugin X,
you have to worry that maybe something in some article depended on the
text passed to X having been macro-processed.

When I installed
the macro processor, I placed it first in plugin order for precisely
this reason. Moving the macro substitution later would have required
me to remember which plugins would be affected by the macro
substitutions and which not. With the macro processing first, the
question has a simple answer: all of them are affected.

Also, I didn't ever want to have to worry that some macro definition
might mangle the output of some plugin. What if you are hacking on
some plugin, and you change it to return <span
style="Foo"> instead of <span style="foo">, and
then discover that three articles you wrote back in 1997 are now
totally garbled because they contained #define Foo
>WUGGA<? It's just too unpredictable. Having the macro
processing occur first means that you can always see in the original
article file just what might be macro-replaced.

So I didn't reorder the plugins.

Another way to solve the TeX ^i problem would have been to do
something like this:

#define ^i <sup><i>i</i></sup>
#define ^*i ^i

with the idea that I could write ^*i in the TeX formula, and
the macro processor would replace it with ^iafter it
was done replacing all the ^i's.

At present the macro processor does not define any order to
macro replacements, but it does guarantee to replace each string only
once. That is, the results of macro replacement are not themselves
searched for macro replacement. This limits the power of the macro
system, but I think that is a good thing. One of the powers that is
thus proscribed is the power to get stuck in an infinite loop.

It occurs to me now that although I call it the world's worst macro
system, perhaps that doesn't give me enough credit for doing good
design that might not have been obvious. I had forgotten about my
choice of single-substituion behavior, but looking back on it a year
later, I feel pleased with myself for it, and imagine that a lot of
people would have made the wrong choice instead.

(A brief digression: unlimited, repeated substitution is a bad move here
because it is complex—much more complex than it appears. A macro
system with single substitution is nothing much, but a macro system with
repeated substitution is a programming language. The semantics of the
λ-calculus is nothing more than simple substitution, repeated as
necessary, and the λ-calculus is a maximally complex computational
engine. Term-rewriting systems are a more obvious theoretical example,
and TeX is a better-known practical example of this phenomenon. I was
sure I did not want my macro system to be a programming language, so I
avoided repeated substitution.)

Because each input text is substituted at most once, the processor's
refusal to define the order of the replacements is not something you
have to think about, as long as your macros are prefix-unique. (That
is, as long as none is a prefix of another.) So you shouldn't
define:

#define foo bar
#define fool idiot

because then you don't know if foolish turns into
barlish or idiotish. This is not a big deal in
practice.

Well, anyway, I did not solve the problem with #define ^*i
^i. I took a much worse solution, which was to hack a
#undefall directive into the macro processor. In my original
article, I boasted that the macro processor "has exactly one feature".
Now it has two, and it's not an improvement. I disliked the new
feature at the time, and now that I'm reviewing the decision, I think
I'm going to take it out.

But this sequence also ocurred in the URLs in the link elements, and
the substitution broke the links. I should probably have changed this
to:

#define Go:del G&ouml;del

But instead I added:

#define GODEL Godel

and then used GODEL in the URLs. Oh well, whatever works, I
guess.

Perhaps my favorite use so far is in an (unfinished) article about
prosopagnosia. I got tired of writing about prosopagnosia and
prosopagnosiacs, so

#define PAa prosopagnosia
#define PAic prosopagnosiac

Note that with these definitions, I get PAa's,
and PAics for free. I could use PAac instead of defining
PAic, but that would prevent me from deciding later that
prosopagnosiac should be spelled "prosopagnosic".

Van der Waerden's problem: program 1
In this series of articles I'm going to analyze four versions of a
program that I wrote around 1988, and then another program that does
the same thing that I wrote last month without referring to the 1988
code.

Program 1

This program does an unpruned exhaustive search of the string space.
Since for V(3, 3) the string space contains 327 =
7,625,597,484,987 strings, it takes a pretty long time to finish. I
quickly realized that I was wasting my time with this program.

The program is invoked with a length argument and an optional
colors argument, which defaults to 2. It then looks for good
strings of the specified length, printing those it finds. If there
are none, one then knows that V(3, colors) >
length. Otherwise, one knows that V(3, colors)
≤ length, and has witness strings to prove it.

I don't want to spend a lot of time on it because there are plenty of C
programming style guides you can read if you care for that.
But already on lines 4–5 we have something I wouldn't write today:

#define NO 0
#define YES !NO

Oh well.

The program wants to iterate through all Cn
strings. How does it know when it's done? It's not easy
to make a program as slow as this one even slower, but I found a way
to do it.

It manufactures the string ABCDDDDDDDDD....D and compares the
current string to that one every time through the loop. A much simpler
method is to detect completion while incrementing the target string.
The function that does the increment looks like this:

(This code is in a pink box to show that it is not actually part of
the program I am discussing in this article.)

The function returns true on success and false on failure. A false
return can be taken by the caller as the signal to terminate the
program.

This replacement function invokes undefined behavior, because
there is no guarantee that p is allowed to run off the
beginning of the string in the way that it does. But there is no need to check the
strings in lexicographic order. Instead of scanning the strings in
the order
AAA,
AAB,
ABA,
ABB,
BAA, etc., one can scan them in reverse lexicographic order:
AAA,
BAA,
ABA,
BBA,
AAB, etc. Then instead of running off the beginning of the
string, p runs off the end, which is allowed.
This fixes the undefined behavior problem and also
eliminates the call to strchr that finds the end of the
string. This is likely to produce a significant speedup:

I hesitate to say that this is the biggest waste of time in the whole
program, since after all it is a program whose job is to examine
7,625,597,484,987 strings. But look. 2/3 of the
calls to this function are asking it to check a string that differs
from the previous string in the final character only. Nevertheless,
it still checks all 49 possible arithmetic progressions, even the ones
that didn't change.

The t ≥ length test is superfluous, or if it isn't,
it should be.

Also notice that I wasn't sure of the precendence in the final test.

It didn't take me long to figure out that this program was not going
to finish in time. I wrote a series of others, which I hope to post
here in coming days. The next one sucks too, but in a completely
different way.

Van der Waerden's problem
In this series of articles I'm going to analyze four versions of a
program that I wrote around 1988, and then another program that does
the same thing that I wrote last month without referring to the 1988
code.

First I'll explain what the programs are about.

Van der Waerden's problem

Color each of a row of dots red or blue, so that no
three evenly-spaced dots are the same color. (That is, if dots
n and n+i are the same color, dot
n+2i must be a different color.) How many dots can you
do?

Well, clearly you can do four: RRBB. And then you can add
another red one on the end: RRBBR. And then another that
could be either red or blue: RRBBRB. And then the next
can be either color, say blue: RRBBRBB.

But now you are
at the end, because if you make the next dot red, then dots 2, 5, and
8 will all be red (RRBBRBBR), and if you make the next dot
blue then dots 6, 7, and 8 will be blue (RRBBRBBB).

But maybe we made a mistake somewhere earlier, and if the first seven
dots were colored differently, we could have made a row of more than 7 that
obeyed the no-three-evenly-spaced-dots requirement. In fact, this is
so: RRBBRRBB is an example.

But this is the end of the line. Any coloring of a row of 9
dots contains three evenly-spaced dots of the same color. (I don't
know a good way to prove this, short of an enumeration of all 512
possible arrangements of dots. Well, of course it is sufficient to
enumerate the 256 that begin with R, but that is pretty much the same
thing.)

Van der Waerden's theorem says that for any number of colors, say
C, a sufficiently-long row of colored dots will contain
n evenly-spaced same-color dots for any n. Or, put
another way, if you partition the integers into C disjoint
classes, at least one class will contain arbitrarily long arithmetic
progressions.

The proof of van der Waerden's theorem works by taking C and
n and producing a number V such that a row of V
dots, colored with C colors, is guaranteed to contain n
evenly-spaced dots of a single color. The smallest such V is
denoted V(n, C). For example V(3, 2) is
9, because any row of 9 dots of 2 colors is guaranteed to contain 3
evenly-spaced dots of the same color, but this is not true of such row
of only 8 dots.

Van der Waerden's theorem does not tell you what V(n,
C) actually is; it provides only an upper bound. And here's the funny
thing about van der Waerden's theorem: the upper bound is incredibly bad.

For V(3, 2), the theorem tells you only that V(3, 2) ≤
325. That is, it tells you that any row of 325 red and blue dots must
contain three evenly spaced dots of the same color. This is true, but
oh, so sloppy, since the same is true of any row of 9 dots.

For V(3, 3), the question is how many red, yellow, and blue
dots do you need to guarantee three evenly-spaced same-colored dots.
The theorem helpfully suggests that:

$$V(3,3) \leq
7(2\cdot3^7+1)(2\cdot3^{7(2\cdot3^7+1)}+1)$$

This is approximately 5.79·1014613. But what is the
actual value of V(3, 3)? It's 27. Urgggh.

In fact, there is a rather large cash prize available to be won by the
first person who comes up with a general upper bound for
V(n,
C) that is smaller than a tower of 2's of height n.
(That's 222... with n 2's.)

In the rest of this series, a string which does not contain three
evenly-spaced equal symbols will be called good, and one which
does contain three such symbols will be called bad. Then a
special case of Van der Waerden's theorem, with n=3, says that, for any
fixed number of symbols, all sufficiently long strings are bad.

In college I wanted to investigate this a little more. In particular,
I wanted to calculate V(3, 3). These days you can just look it
up on Wikipedia, but in those benighted times such information was
hard to come by. I also wanted to construct the longest possible good
strings, witnesses of length V(3, 3)-1. Although I did not
know it at the time, V(3, 3) = 27, so a witness should have
length 26. It turns out that there are exactly 48 witnesses of length
26. Here are the 1/6 of them that begin with RB or RRB:

The rest of the witnesses may be obtained by permuting the colors in
these eight.

I wrote a series of C programs around 1988 to exhaustively search for
good strings. Last month I was in a meeting and I decided to write
the program again for some reason. I wrote a much better program.
This series of articles will compare the five programs. I will post
the first one tomorrow.

[ Addendum 20071005: I made a mistake in the expression I gave for the
upper bound on V(3,3) and left out a factor of 7 in the
exponent on the last 3. I had said that the upper bound was around
102092, but actually it is more like the seventh power of
this. ]

Thank you very much for that bulletin
I'm about to move house, and so I'm going through a lot of old stuff
and throwing it away. I just unearthed the decorations from my office
door circa 1994. I want to record one of these here before I throw it
away and forget about it. It's a clipping from the front page of the
New York Times from 11 April, 1992. It is noteworthy for
its headline, which only one column wide, but at the very top of
page A1, above the fold. It says:

FIGHTING IMPERILS
EFFORTS TO HALT WAR
IN YUGOSLAVIA

Sometimes good articles get bad headlines. Often the headlines are tacked on just
before press time by careless editors. Was this a good article
afflicted with a banal headline? Perhaps they meant there was
internecine squabbling among the diplomats charged with the
negotiations?

No. If you read the article it turned out that it was
about how darn hard it was to end the war when folks kept shooting at
each other, dad gum it.

I hear that the headline the following week was DOG BITES MAN, but I
don't have a clipping of that.

This article got some attention back in July, when a lot of
people were Google-searching for "john wilkins metric system", because
the UK Metric Association had put out a press
release making the same points, this time discovered by an
Australian, Pat Naughtin.

According to Pat Naughtin, the Metric System was invented in
England in 1668, one hundred and twenty years before the French
adopted the system. He discovered this in an ancient and rare
book...

Actually, though, he did not discover it in Wilkins' ancient and rare
book. He discovered it by reading The Universe of
Discourse, and then went to the ancient and rare book I
cited, to confirm that it said what I had said it said. Remember,
folks, you heard it here first.

Anyway, that is not what I planned to write about. In the earlier
article, I discussed Wilkins' original definition of the Standard,
which was based on the length of a pendulum with a period of exactly
one second. Then:

Let d be the distance from the
point of suspension to the center of the bob, and r be the
radius of the bob, and let x be such that d/r =
r/x. Then d+(0.4)x is the standard
unit of measurement.

(This is my translation of Wilkins' Baroque language.)

But this was a big puzzle to me:

Huh? Why 0.4? Why does r come into it? Why not just use
d? Huh?

Soon after the press release came out, I got email from a gentleman
named Bill Hooper, a retired professor of physics of the University of
Virginia's College at Wise, in which he explained this puzzle completely, and in some
detail.

According to Professor Hooper, you cannot just use d here,
because if you do, the length will depend on the size, shape, and
orientation of the bob. I did not know this; I would have supposed
that you can assume that the mass of the bob is concentrated at its
center of mass, but apparently you cannot.

The usual Physics I calculation that derives the period of a pendulum
in terms of the distance from the fulcrum to the center of the bob
assumes that the bob is infinitesimal. But in real life the bob is
not infinitesimal, and this makes a difference. (And Wilkins
specified that one should use the most massive possible bob, for
reasons that should be clear.)

No, instead you have to adjust the distance d in the formula by
adding I/md, where m is the mass of the bob and I
is the moment of intertia of the bob, a property which depends on the
shape, size, and mass of the bob. Wilkins specified a spherical bob, so we
need only calculate (or look
up) the formula for the moment of inertia of a sphere. It turns
out that for a solid sphere, I = 2mr2/5.
That is, the distance needed is not d, but d +
2r2/5d. Or, as I put it above, d +
(0.4)x, where d/r = r/x.

Well, that answers that question. My very grateful thanks to
Professor Hooper for the explanantion. I think I might have figured
it out myself eventually, but I am not willing to put a bound of less
than two hundred years on how long it would have taken me to do so.

One lesson to learn from all this is that those early Royal Society
guys were very smart, and when they say something has a mysterious
(0.4)x in it, you should assume they know what they are doing.
Another lesson is that mechanics was pretty well-understood by
1668.

Why spiders hang with their heads down
Iris asked me last week why spiders hang in their webs with their
heads downwards, and I said I would try to find out. After a cursory
Google search, I was none the wiser, so I tried asking the Wikipedia
"reference desk" page. I did not learn anything useful about the
spiders, but I did learn that the reference desk page is full of
people who know even less about spiders than I do who are nevertheless
willing to post idle speculations.

Fortunately, I was at a meeting this week in Durham that was also
attended by three of the world's foremost spider experts. I put the
question to Jonathan
A. Coddington, curator of arachnids for the Smithsonian
Institution.

Professor Coddington told me that it was because the spider prefers
(for obvious mechanical and dynamic reasons) to attack its prey from
above, and so it waits
the upper part of the web and constructs the web so that the principal
prey-catching portion is below. When prey is caught in the web, the
spider charges down and attacks it.

I had mistakenly thought that spiders in orb webs (which are the
circular webs you imagine when you try to think of the canonical
spiderweb) perched in the center. But it is only the topological
center, and geometrically it is above the midline, as the adjacent
picture should make clear. Note that more of the radial threads are
below the center than are above it.

Girls of the SEC
I'm in the Raleigh-Durham airport, and I just got back from the
newsstand, where I learned that the pictorial in this month's
Playboy magazine this month is "Girls of the SEC". On
seeing this, I found myself shaking my head in sad puzzlement.

This isn't the first time I've had this reaction on learning about a
Playboy pictorial; last time was probably in August 2002 when I
saw the "Women of Enron" cover. (I am not making this up.) I wasn't
aware of The December 2002 feature, "Women of Worldcom" (I
swear I'm not making this up), but I would have had the same
reaction if I had been.

I know that in recent years the Playboy franchise has fallen from its
former heights of glory: circulation is way down, the Playboy Clubs
have all closed, few people still carry Playboy keychains. But I
didn't remember that they had fallen quite so far. They seem to have
exhausted all the plausible topics for pictorial features, and are now
well into the scraping-the-bottom-of-the-barrel stage. The June 1968
feature was "Girls of Scandinavia". July 1999, "Girls of Hawaiian
Tropic". Then "Women of Enron" and now "Girls of the SEC".

How many men have ever had a fantasy about sexy SEC employees, anyway?
How can you even tell? Sexy flight attendants, sure; they wear
recognizable uniforms. But what characterizes an SEC employee?
A rumpled flannel suit? An interest in cost
accounting? A tendency to talk about the new Basel II banking
regulations?
I tried to think of a category that would be less sexually
inspiring than "SEC employees". It's difficult. My first thought
was "Girls of Wal-Mart." But no, Wal-Mart employees wear uniforms.

If you go too far in that direction you end up in the realm of fetish.
For example, Playboy is unlikely to do a feature on "girls of the
infectious disease wards". But if they did, there is someone
(probably on /b/),
who would be extremely interested. It is hard to imagine
anyone with a similarly intense interest in SEC employees.

So what's next for Playboy? Girls of the hospital
gift shops? Girls of State Farm Insurance telephone customer service
division? Girls of the beet canneries? Girls of Acadia University
Grounds and Facilities Services? Girls of the DMV?

[ Pre-publication addendum: After a little more research, I figured
out that SEC refers here to "Southeastern Conference" and that Playboy
has done at least two other features with the same title, most
recently in October 2001. I decided to run the article anyway, since
I think I wouldn't have made the mistake if I hadn't been prepared
ahead of time by "Women of Enron". ]

Apparently, the "inconsistency" noted by Gödel is simply that the
Constitution provides for its own amendment. Suber says: "He noticed
that the AC had procedural limitations but no substantive limitations;
hence it could be used to overturn the democratic institutions
described in the rest of the constitution." I am gravely
disappointed. I had been hoping for something brilliant and subtle
that only Gödel would have noticed.

M. Padgett also pointed out that the scheme I proposed for
amending the constitution, which I claimed would require only the
cooperation of a majority of both houses of Congress, 218 + 51 = 269
people in all, would actually require a filibuster-proof majority in
the Senate. He says that to be safe you would want all 100 senators
to conspire; I'm not sure why 60 would not be sufficient. (Under
current Senate rules, 60 senators can halt a filibuster.) This
would bring the total required to 218 + 60 = 278 conspirators.

He also pointed out that the complaisance of five Supreme Court
justices would give the President essentially dictatorial powers,
since any legal challenge to Presidential authority could be rejected
by the court. But this train of thought seems to have led both of us
down the same path, ending in the idea that this situation is not
really within the scope of the original question.

As a final note, I will point out what I think is a much more serious
loophole in the Constitution: if the Vice President is impeached and
tried by the Senate, then, as President of the Senate, he presides
over his own trial. Article I, section 3 contains an exception for
the trial of the President, where the Chief Justice presides instead.
But the framers inexplicably forgot to extend this exception to the
trial of the Vice President.

Gödel took the matter of citizenship with great solemnity,
preparing for the exam by making a close study of the United States
Constitution. On the eve of the hearing, he called [Oskar] Morgenstern
in an agitated state, saying he had found an "inconsistency" in the
Constitution, one that could allow a dictatorship to arise.

I suppose the Attorney General could bring some sort of suit in the
Supreme Court that resulted in the Court "interpreting" the
Constitution to find that the President had the power to, say,
arbitrarily replace congresspersons with his own stooges. This would
require only six conspirators: five justices and the
President. (The A.G. is a mere appendage of the President and is not
required for the scheme anyway.)

But this seems outside the rules. I'm not sure what the rules are,
but having the Supreme Court radically and arbitrarily "re-interpret"
the Constitution isn't an "inconsistency in the Constitution". The
solution above is more like a coup d'etat. The Joint Chiefs of Staff
could stage a military takeover and institute a dictatorship, but that
isn't an "inconsistency in the Constitution" either. To qualify, the
Supreme Court would have to find a plausible interpretation of
the Constitution that resulted in a dictatorship.

The best solution I have found so far is this: Under Article IV,
Congress has the power to admit new states. A congressional majority
could agree to admit 150 trivial new states, and then propose
arbitrary constitutional amendments, to be ratified by the trivial
legislatures of the new states.

This would require a congressional majority in both houses. So
Gödel's constant, the smallest number of conspirators required to
legally transform the United States into a dictatorship, is at most
269. (This upper bound would have been 267 in 1948 when Gödel became
a citizen.) I would like to reduce this number, because I can't see
Gödel getting excited over a "loophole" that required so many
conspirators.

The names are rather horrible, so I think that from now on I'll just
refer to them as D4, D6, D8, D10, D12, D14, and D20.

The number of edges that meet at a vertex is its valence.
Vertices in convex deltahedra have valences of 3, 4, or 5. The
valence can't be larger than 5 because only six equilateral triangles
will fit, and if you fit 6 then they lie flat and the polyhedron is
not properly convex.

Let V3, V4, and V5 be the number of vertices of valences 3, 4, and
5, respectively. Then:

What

V3

V4

V5

D4

4

D6

2

3

D8

6

D10

5

2

D12

4

4

D14

3

6

D16

2

8

D20

12

There's a clear pattern here, with V3s turning into V4s two at a time
until you reach the octahedron (D8) and then V4s turning into V5s one
at a time until you reach the icosahedron (D20). But where is V4=1,
V5=10? There's a missing deltahedron. I don't mean it's missing from
the table; I mean it's missing from the universe.

Well, this is all oversubtle, I realized later, because you don't need
to do the V3–V4–V5 analysis to see that something is
missing. There are convex deltahedra with 4, 6, 8, 10, 12, 14, and 20
faces; what happened to 18?

Still, I did a little work on a more careful analysis that might shed
some light on the 18-hedron situation. I'm still in the middle of it,
but I'm trying to continue my policy of posting more frequent, partial
articles.

Let V be the number of vertices in a convex deltahedron,
E be the number of edges, and F be the number of
faces.

We then have V = V3 + V4 + V5. We also have
E = ½(3V3 + 4V4 + 5V5).
And since each face has
exactly 3 edges, we have 3F = 2E.

It is very easy to enumerate all possible solutions of this
equation. There are 19:

V3

V4

V5

What

4

0

0

D4

3

1

1

3

0

3

2

3

0

D6

2

2

2

2

1

4

2

0

6

1

4

1

1

3

3

1

2

5

1

1

7

1

0

9

0

6

0

D8

0

5

2

D10

0

4

4

D12

0

3

6

D14

0

2

8

D16

0

1

10

0

0

12

D20

Solutions in green correspond to convex deltahedra. What goes wrong
with the
other 11 items?

(3,1,1) fails completely because to have V5 > 0 you need V
≥ 6. There isn't even a graph with (V3, V4, V5) = (3,1,1), much
less a polyhedron.

There is a graph with (3,0,3), but it is decidedly nonplanar: it
contains K3,3,
plus an additional triangle. But the graph of any polyhedron must be
planar, because you can make a little hole in one of the faces of the
polyhedron and flatten it out without the edges crossing.

Another way to think about (3,0,3) is to consider it as a sort of
triangular tripyramid. Each of the V5s shares an edge with each of
the other five vertices, so the three V5s are all
pairwise connected by edges and form a
triangle. Each of the three V3s must be connected to each of the
three vertices of this triangle. You can add two of the required V3s,
by erecting a triangular pyramid on the top and the bottom of the
triangle. But then you have nowhere to put the third pyramid.

On Thursday I didn't know what went wrong with (2,2,2); it seemed
fine. (I found it a little challenging to embed it in the plane, but
I'm not sure if it would still be challenging if it hadn't been the
middle of the night.) I decided that when I got into the office on
Friday I would try making a model of it with my magnet toy and see
what happened.

It turned out that nothing goes wrong with (2,2,2). It makes a
perfectly good non-convex deltahedron. It's what you get when
you glue together three tetrahedra, face-to-face-to-face. The
concavity is on the underside in the picture.

(2,0,6) was a planar graph too, and so the problem had to be
geometric, not topological. When I got to the office, I put it
together. It also worked fine, but the result is not a polyhedron.
The thing you get could be described as a gyroelongated triangular
dipyramid. That is, you take an octahedron and glue tetrahedra to two
of its opposite faces. But then the faces of the tetrahedra are
coplanar with the faces of the octahedron to which they abut, and this
is forbidden in polyhedra. When that happens you're supposed to
eliminate the intervening edge and consider the two faces to be a
single face, a rhombus in this case.
The resulting thing is not a
polyhedron with 12 triangular faces, but one with six rhombic faces (a
rhombohedron), essentially a squashed cube. In fact, it's exactly
what you get if you make a cube from the magnet toy and then try to
insert another unit-length rod into the diagonal of each of the six
faces. You have to squash the cube to do this, of course, since the
diagonals had length √2 before and length 1 after.

So there are several ways in which the triples (V3,V4,V5) can fail to
determine a convex deltahedron: There is an utter topological
failure, as with
(3,1,1).

There is a planarity failure, which is also topological, but less
severe, as with (3,0,3). (3,0,3) also fails because you can't embed
it into R3. (I mean that you cannot embed its 3-skeleton.
Of course you can embed its 1-skeleton in R3, but that is
not sufficient for the thing to be a polyhedron.) I'm not sure if
this is really different from the previous failure; I need to consider
more examples. And (3,0,3) fails in yet another way: you can't even
embed its 1-skeleton in R3 without violating the constraint
that says that the edges must all have unit length. The V5s must lie
at the vertices of an equilateral triangle, and then the three unit
spheres centered at the V5s intersect at exactly two points of
R3. You can put two of the V3s at these points, but this
leaves nowhere for the third V3. Again, I'm not sure that this is a
fundamentally different failure mode than the other two.

Another failure mode is that the graph might be embeddable into
R3, and might satisfy the unit-edge constraint, but in
doing so it might determine a concave polyhedron, like
(2,2,2) does, or a non-polyhedron, like (2,0,6) does.

I still have six (V3,V4,V5) triples to look into. I wonder if there
are any other failure modes?

I should probably think about
(0,1,10) first, since the whole point of all this was to figure out
what happened to D18. But I'm trying to work up from the simple cases
to the harder ones.

I suppose the next step is to look up the proof that there are only
eight convex deltahedra and see how it goes.

I suspect that (2,1,4) turns out to be nonplanar, but I haven't looked
at it carefully enough to actually find a forbidden minor.

One thing that did occur to me today was that a triple (V3, V4, V5)
doesn't necessarily determine a unique graph, and I need to look into
that in more detail. I'll be taking a plane trip on Sunday and I plan
to take the magnet toy with me and continue my investigations on the
plane.

In other news, Iris and I went to my office this evening to drop off
some books and pick up some stuff for the trip, including the magnet
toy. Iris was very excited when she saw the collection of convex
deltahedron models on my desk, each in a different color, and wanted
to build models just like them. We got through all of them, except
D10, because we ran out of ball bearings. By the end Iris was getting
pretty good at building the models, although I think she probably
wouldn't be able to do it without directions yet. I thought it was
good work, especially for someone who always skips from 14 to 16 when
she counts.

On the way home in the car, we were talking about how she was getting
older and I rhapsodized about how she was learning to do more things,
learning to do the old things better, learning to count higher, and so
on. Iris then suggested that when she is older she might remember to
include 15.

(I did not make this up; I got it from here.)
The flag here is the family
$checked01,
$checked02, etc. Such code is almost always improved by
replacing the family with an array, and the repeated code with a
loop:

$checked[$_] = $FORM{"h$_"} for "01" .. "06";

Actually in this particular case a better solution was to eliminated
the checked variables entirely, but that is not what I
was planning to discuss. Rather, I planned to discuss a recent
instance in which I wrote some code with a family of variables myself,
and the fix was somewhat different.

The program I was working on was a digester for the qmail logs,
translating them into a semblance of human-readable format. (This is
not a criticism; log files need not be human-readable; they need to be
easy to translate, scan, and digest.) The program scans the log,
gathering information about each message and all the attempts to
deliver it to each of its recipient addresses. Each delivery can be
local or remote.

Normally the program prints information about each
message and all its deliveries. I was
adding options to the program to allow the user to specify that only
local deliveries or only remote deliveries were of interest.

As you see, this is where I made my mistake, and introduced a
(two-member) family of variables. The conventional fix says that this
should have been something like $do_only{local} and
$do_only{remote}. But I didn't notice my mistake right away.

Later on, when processing a message, I wanted to the program to scan
its deliveries, and skip all processing and display of the message unless
some of its deliveries were of the interesting type:

if ($local_only || $remote_only) {
...
}

I had vague misgivings at this point about the test, which seemed
redundant, but I pressed on anyway, and found myself in minor trouble.
Counting the number of local or remote deliveries was complicated:

There is a duplication of code here. Also, there is a waste of CPU
time, since the program never needs to have both numbers available.
This latter waste could be avoided at the expense of complicating the
code, by using something like $n_remote_deliveries = keys(%{$msg->{del}}) -
$n_local_deliveries, but that is not a good solution.

Also, the complete logic for skipping the report was excessively
complicated:

but that is not addressing the real problem, which was the family of
variables, $local_only and $remote_only, which
inevitably lead to duplicated code, as they did here.

Such variables are related by a convention in the programmer's mind,
and nowhere else. The language itself is as unaware of the
relationship as if the variables had been named
$number_of_nosehairs_on_typical_goat and
$fusion_point_of_platinum. A cardinal rule of programming is
to make such conventional relationships explicit, because then the
programming system can give you some assistance in dealing with them.
(Also because then they are apparent to the maintenance programmer,
who does not have to understand the convention.) Here, the program
was unable to associate $local_only with the string
"local" and $remote_only with "remote", and
I had to make up the lack by writing additional code.

For families of variables, the remedy is often to make the
relationship explicit by using an aggregate variable, such as an array
or a hash, something like this:

Here the relationship is explicit because $use_only{"remote"}
indicates an interest in remote deliveries and
$use_only{"local"} indicates an interest in local deliveries,
and the program can examine the key in the hash to determine what to
look for in the {lr} data.

But in this case the alternatives are disjoint, so the
%use_only hash will never contain more than one element. The
tipoff is the bizarre ($only_these) = keys ... line. Since
the hash is really storing a single scalar, it can be replaced with a
scalar variable:

A long time ago I started to suspect that flag variables themselves
are a generally bad practice, and are best avoided, and I think this
example is evidence in favor of that theory. I had a conversation
about this yesterday with Aristotle Pagaltzis, who is very thoughtful
about this sort of thing. One of our conclusions was that although
the flag variable can be useful to avoid computing the same boolean
value more than once, if it is worth having, it is because your
program uses it repeatedly, and so it is probably testing the
same boolean value more than once, and so it is likely that the
program logic would be simplified if one could merge the blocks that
would have been controlled by those multiple tests into one place,
thus keeping related code together, and eliminating the repeated
tests.

Followup notes about dice and polyhedra
I got a lot of commentary about these geometric articles, and started
writing up some followup notes. But halfway through I got stuck in
the middle of making certain illustrations, and then I got sick, and
then I went to a conference in Vienna. So I decided I'd better
publish what I have, and maybe I'll get to the other fascinating
points later.

Several people wrote in to cast doubt on my assertion that the
probability of an irregular die showing a certain face is
proportional to the solid angle subtended by that face from the
die's center of gravity. But nobody made the point more clearly
than Robert Young, who pointed out that if I were right, a coin
would have a 7% chance of landing on its edge. I hereby recant
this claim.

John Berthels suggested that my analysis might be correct if the
die was dropped into an inelastic medium like mud that would
prevent it from bouncing.

Jack Vickeridge referred me to this web
site, which has a fairly extensive discussion of seven-sided
dice. The conclusion: if you want a fair die, you have no choice
but to use something barrel-shaped.

Michael Lugo wrote a
detailed followup in which he discusses this and related
problems. He says "What makes Mark's problem difficult is the
lack of symmetry; each face has to be different." Quite so.

Aaron Crane says that these dice (with faces {1,2,2,3,3,4} and
{1,3,4,5,6,8}) are sometimes known as "Sicherman dice", after the
person who first brought them to the attention of Martin Gardner.
Can anyone confirm that this was Col. G.L. Sicherman? I have no
reason to believe that it was, except that it would be so very
unsurprising if it were true.

Addendum 20070905: I now see that the Wikipedia article
attributes the dice to "Colonel George Sicherman," which is
sufficiently clear that I would feel embarrassed to write to the
Colonel to ask if it is indeed he. I also discovered the the
Colonel has a
Perl program on his web site that will calculate "all pairs of
n-sided dice that give the same sums as standard
n-sided dice".

M. Crane also says that it is an interesting question
which set of dice is better for backgammon. Both sets have
advantages: the standard set rolls doubles 1/6 of the time,
whereas the Sicherman dice only roll doubles 1/9 of the time. (In
backgammon, doubles count double, so that whereas a player who
rolls a–b can move the pieces a total of
a+b points, a player who rolls a–a
can move pieces a total of 4a points.) The standard dice
permit movement of 296/36 points per roll, and the Sicherman dice
only 274/36 points per roll.

Ofsetting this disadvantage is the advantage that the Sicherman
dice can roll an 8. In backgammon, one's own pieces may not land on a point
occupied by more than one opposing piece. If your opponent
occupies six conscutive points with two pieces each, they form an
impassable barrier. Such a barrier is passable to a
player using the Sicherman dice, because of the 8.

Doug Orleans points out that in some
contexts one might prefer to use a Sicherman variant dice {2,3,3,4,4,5} and
{0,2,3,4,5,7}, which retain the property that opposite faces sum
to 7, and so that each die shows 3.5 pips on average. Such dice
roll doubles as frequently as do standard dice.

The Wikipedia article on dice asserts that the {2, 3, 3,
4, 4, 5} die is used in some wargames to express the strength of
"regular" troops, and the standard {1, 2, 3, 4, 5, 6} die to
express the strength of "irregular" troops. This makes the
outcome of battles involving regular forces more predictable than
those involving irregular forces.

Several people proposed alternative constructions for the snub
disphenoid.

Brooks Moses suggested the following construction: Take a square
antiprism, squash the top square into a rhombus, and insert a
strut along the short diagonal of the rhombus. Then squash and
strut the bottom square similarly.

It seems, when you think about this, that there are two ways to do
the squashing. Suppose you squash the bottom square horizontally
in all cases. The top square is turned 45° relative to the
bottom (because it's an antiprism) and so you can squash it along
the -45° diagonal or along the +45° diagonal, obtaining a
left- and a right-handed version of the final solid. But if you
do this, you find that the two solids are the same, under a
90° rotation.

This construction, incidentally, is equivalent to the one I
described in the previous article: I said you should take two
rhombuses and connect corresponding vertices. I had a paragraph
that read:

But this is where I started to get it wrong. The two
wings have between them eight edges, and I had imagined that you
could glue a rhombic antiprism in between
them. . . .

But no, I was right; you can do exactly this, and you get a snub
disphenoid. What fooled me was that when you are looking at the
snub disphenoid, it is very difficult to see where the belt of
eight triangles from the antiprism got to. It winds around the
polyhedron in a strange way. There is a much more obvious belt of
triangles around the middle, which is not suitable for an
antiprism, being shaped not like a straight line but more like the
letter W, if the letter W were written on a cylinder and had its
two ends identified. I was focusing on this belt, but the other
one is there, if you know how to see it.

The snub disphenoid has four vertices with valence 4 and four with
valence 5. Of its 12 triangular faces, four have two valence-4
vertices and one valence-5 vertex, and eight have one valence-4
vertex and two valence-5 vertices. These latter eight form the belt
of the antiprism.

M. Moses also suggested taking a triaugmented triangular
prism, which you will recall is a triangular prism with a square
pyramid erected on each of its three square faces, removing one of
the three pyramids, and then squashing the exposed square face
into a rhombus shape, adding a new strut on the diagonal. This
one gives me even less intuition about what is going on, and it
seems even more strongly that it shou,ld matter whether you put in
the extra strut from upper-left to lower-right, or from
upper-right to lower-left. But it doesn't matter; you get the
same thing either way.

Jacob Fugal pointed out that you can make a
snub disphenoid as follows: take a pentagonal dipyramid, and
replace one of the equatorial *----*----* figures with a rhombus.
This is simple, but unfortunately gives very little intuition for
what the disphenoid is like. It is obvious from the construction
that there must be pentagons on the front and back, left over from
the dipyramid. But it is not at all clear that there are now two
new upside-down pentagons on the left and right sides, or that the
disphenoid has a vertical symmetry.

A few people asked me where John Batzel got they magnet toy
that I was using to construct the models. It costs only
$5! John gave me his set, and I bought three more, and I now
have a beautiful set of convex deltahedra and a stellated
dodecahedron on my desk. (Actually, it is not precisely a
stellated dodecahedron, since the star faces are not quite planar,
but it is very close. If anyone knows the name of this thing,
which has 32 vertices, 90 edges, and 60 equilateral triangular
faces, I would be pleased to hear about it.) Also I brought my
daughter Iris into my office a few weekends ago to show
her the stella
octangula ("I wanna see the stella octangula, Daddy! Show me
the stella octangula!") which she enjoyed; she then stomped on
it, and then we built another one together.

Different arrangements for standard dice
Gaal Yahas wrote to refer me to an article
about a pair of dice that never roll seven. It sounded cool, but
but it was too late at night for me to read it, so I put it on the
to-do list. But it reminded me of a really nice puzzle, which is to
find a nontrivial relabeling of a pair of standard dice that gives the
same probability of throwing any sum from 2 to 12. It's a happy (and
hardly inevitable) fact that there is a solution.

To understand just what is being asked for here, first observe that a
standard pair of dice throws a 2 exactly 1/36 of the time, a 3 exactly
2/36 of the time, and so forth:

2

1/36

3

2/36

4

3/36

5

4/36

6

5/36

7

6/36

8

5/36

9

4/36

10

3/36

11

2/36

12

1/36

The standard dice have faces numbered 1, 2, 3, 4, 5, and 6. It should
be clear that if one die had {0,1,2,3,4,5} instead, and the other had
{2,3,4,5,6,7}, then the probabilities would be exactly the same.
Similarly you could subtract 3.7 from every face of one die, giving it
labels {-2.7, -1.7, -0.7, 0.3, 1.3, 2.3}, and if you added the 3.7 to
every face of the other die, giving labels
{4.7, 5.7, 6.7, 7.7, 8.7, 9.7}, you'd still have the same chance of
getting any particular total. For example, there are still exactly 2
ways out of 36 possible rolls to get the total 3: you can roll -2.7 +
5.7, or you can roll -1.7 + 4.7. But the question is to find a
nontrivial relabeling.

Like many combinatorial problems, this one is best solved with
generating functions. Suppose we represent a die as a polynomial. If
the polynomial is
Σaixi,
it represents a die that has
ai chances to produce the value i. A
standard die is
x6 +
x5 +
x4 +
x3 +
x2 +
x, with one chance to produce each integer from 1 to 6. (We
can deal with probabilities instead of "chances" by requiring that
Σai = 1, but it comes to pretty much
the same thing.)

The reason it's useful to adopt this representation is that rolling
the dice together corresponds to multiplication of the polynomials.
Rolling two dice together, we multiply
(x6 +
x5 +
x4 +
x3 +
x2 +
x) by itself and get
P(x) = x12 +
2x11 +
3x10 +
4x9 +
5x8 +
6x7 +
5x6 +
4x5 +
3x4 +
2x3 +
x2, which gives the chances of getting any
particular sum; the coefficient of the
x9 term is 4, so there are 4 ways to roll a 9 on two
dice.

What we want is a factorization of this 12th-degree polynomial into
two polynomials Q(x) and R(x) with
non-negative coefficients. We also want
Q(1) = R(1) = 6, which forces the corresponding dice to
have 6 faces each. Since we already know that P(x) =
(x6 +
x5 +
x4 +
x3 +
x2 +
x)2, it's not hard; we really only have to factor
x6 +
x5 +
x4 +
x3 +
x2 +
x and then see if there's any suitable way of rearranging the
factors.

We want to combine these into two products Q(x)
and R(x) such that Q(1) = R(1) = 6.
If we calculate f(1) for each of these, we get 1, 3 (pink), 1,
and 2 (blue).
So each of Q and R will require one of the factors that
has f(1) = 3 and one that has f(1) = 2; we can
distribute the f(1) = 1 factors as needed. For normal dice the way we
do this is to assign all the factors in each row to one die. If we
want alternative dice, our only real choice is what to do with the
x2 - x + 1 and x factors.

Redistributing the lone x factors just corresponds to
subtracting 1 from all the faces of one die and adding it back to all
the faces of the other, so we can ignore them. The only interesting
question is what to do with the
x2 - x + 1 factors. The normal distribution
assigns one to each die, and the only alternative is to assign both of
them to a single die. This gives us the two polynomials:

x(x2 + x + 1)(x + 1)

=

x4 +
2x3 +
2x2 +
x

x(x2 + x + 1)(x +
1)(x2 - x + 1)2

=

x8 +
x6 +
x5 +
x4 +
x3 +
x

And so the solution is that one die has faces {1,2,2,3,3,4} and the
other has faces {1,3,4,5,6,8}:

1

2

2

3

3

4

1

2

3

3

4

4

5

3

4

5

5

6

6

7

4

5

6

6

7

7

8

5

6

7

7

8

8

9

6

7

8

8

9

9

10

8

9

10

10

11

11

12

Counting up entries in the table, we see that there are indeed 6 ways
to throw a 7, 4 ways to throw a 9, and so forth.

One could apply similar methods to the problem of making a pair of
dice that can't roll 7. Since there are six chances in 36 of rolling
7, we need to say what will happen instead in these 6 cases. We might
distribute them equally among some of the other possibilities, say 2,
4, 6, 8, 10, and 12, so that we want the final distribution of results
to correspond to the polynomial 2x12 +
2x11 + 4x10 +
4x9 + 6x8 + 6x6
+ 4x5 + 4x4 +
2x3 + 2x2. The important thing to
notice here is that the coefficient of the x7 term
is 0.

Now we want to factor this polynomial and proceed as before.
Unfortunately, it is irreducible. (Except for the trivial factor of
x2.) Several other possibilities are similarly
irreducible. It's tempting to reason from the dice to the algebra,
and conjecture that any reducible polynomial that has a zero
x7 term must be rather exceptional in other ways,
such as by having only even exponents. But I'm not sure it will work,
because the polynomials are more general than the dice: the
polynomials can have negative coefficients, which are meaningless for
the dice. Still, I can fantasize that there might be some result of
this type available, and I can even imagine a couple of ways of
getting to this result, one combinatorial, another based on Fourier
transforms. But I've noticed that I have a tendency to want to leave
articles unpublished until I finish exploring all possible aspects of
them, and I'd like to change that habit, so I'll stop here, for
now.

The object of "Wumpus" is to kill the Wumpus, which hides in a
network of twenty caves arranged in a dodecahedron. Each cave is thus
connected to three others. On your turn, you may move to an adjacent
cave or shoot a crooked arrow. The arrow can pass through up to five
connected caves, and if it enters the room where the Wumpus is, it
kills him and you win. Two of the caves contain bottomless pits; to
enter these is death. Two of the caves contain giant bats, which
will drop you into another cave at random; if it contains a pit, too
bad. If you are in a cave adjacent to a pit, you can feel a draft; if
you are adjacent to bats, you can hear them. If you are adjacent to
the Wumpus, you can smell him. If you enter the Wumpus's cave, he
eats you. If you shoot an arrow that fails to kill him, he wakes up
and moves to an adjacent cave; if he enters you cave, he eats you.
You have five arrows.

I did not learn until much later that the caves are connected in a
dodecahedron; indeed, at the time I probably didn't know what a
dodecahedron was. The twenty caves were numbered, so that cave 1 was
connected to 2, 5, and 8. This necessitated a map, because otherwise
it was too hard to remember which room was connected to which.

Or did it? If the map had been a cube, the eight rooms could have
been named 000, 001, 010, etc., and then it would have been trivial to
remember: 011 is connected to 111, 001, and 010, obviously, and you
can see it at a glance. It's even easy to compute all the paths
between two vertices: the paths from 011 to 000 are 011–010–000 and
011–001–000; if you want to allow longer paths you can easily come
up with 011–111–110–100–000 for example.

And similarly, the Wumpus source code contains a table that records
which caves are connected to which, and consults this table in many
places. If the caves had been arranged in a cube, no table would have
been required. Or if one was wanted, it could have been generated algorithmically.

So I got to wondering last week if there was an analogous nomenclature
for the vertices of a dodecahedron that would have obviated the Wumpus
map and the table in the source code.

I came up with a very clever proof that there was none, which would
have been great, except that the proof also worked for the
tetrahedron, and the tetrahedron does have such a convenient
notation: you can name the vertices (0,0,0), (0,1,1), (1,0,1), and
(1,1,0), where there must be an even number of 1 components. (I mentioned this yesterday in
connection with something else and promised to come back to it. Here
it is.) So the proof was wrong, which was good, and I kept thinking
about it.

The next-simplest case is the octahedron, and I racked my brains
trying to come up with a convenient notation for the vertices that
would allow one to see at a glance which were connected. When I
finally found it, I felt like a complete dunce. The octahedron has
six vertices, which are above, below, to the left of, to the right of,
in front of, and behind the center. Their coordinates are therefore
(1,0,0), (-1,0,0), (0,1,0), (0,-1,0), (0,0,1) and (0,0,-1). Two
vertices are opposite when they have two components the same
(necessarily both 0) and one different (necessarily negatives).
Otherwise, they are connected by an edge. This is really simple
stuff.

Still no luck with the dodecahedron. There are nice canonical
representations of the coordinates of the vertices—see
the Wikipedia article, for example—but I still haven't
looked at it closely enough to decide if there is a simple procedure
for taking two vertices and determining their geometric relation at a
glance. Obviously, you can check for adjacent vertices by calculating
the distance between them and seeing if it's the correct value, but
that's not "at a glance"; arithmetic is forbidden.

It's easy to number the vertices in layers, say by calling the top five
vertices A1 ... A5, then the five below that
B1 ... B5, and so on. Then it's easy to see that
A3 will be adjacent to A2, A4, and
B3, for example.

But this nomenclature, unlike the good ones
above, is not isometric: it
has a preferred orientation of the dodecahedron. It's obvious
that A1, A2, A3, A4, and A5 form a
pentagonal face, but rather harder to see that
A2, A3,
B2, B3, and C5 do. With the cube, it's easy to see
what a rotation or a reflection looks like. For example, rotation of
120° around an axis through a pair of vertices of the cube takes
vertex (a, b, c) to
(c, a, b); rotation of 90° around an axis
through a face takes it to
(1-b, a, c). Similarly, rotations and reflections
of the tetrahedron correspond to simple permutations of the components
of the vertices.
Nothing like this exists for the
A-B-C-D nomenclature for the
dodecahedron.

Sub-blogs and the math sub-blog
I notice that a number of people have my blog included in lists of
"math blogs", which is fine with me, but I got a bit worried when I
saw someone's web site that actually includes a lot of "math
blog" articles, including mine, which is only ever about one-fourth
math, the rest being given over to random other stuff. So the "math
blog" section of this guy's web site is carrying my ill-informed
articles about evolutionary biology and notes about the Frances the
Badger books.

That got me thinking about the problem in general. For some reason
I've been trying to construct a die whose faces come up with
probabilities 1/21, 2/21, 3/21, 4/21, 5/21, and 6/21 respectively.

Unless there is a clever insight I haven't had, I think this will be
rather difficult to do explicitly. (Approximation methods will
probably work fairly easily though, I think.) I started by trying to
make a hexahedron with faces that had areas 1, 2, 3, 4, 5, 6, and even
this has so far evaded me. This will not be sufficient to solve
the problem, because the probability that the hexahedron will land on
face F is not proportional to the area of F, but rather
to the solid angle subtended by F from the hexahedron's center
of gravity.

Anyway, I got interested in the idea of making a hexahedron whose
faces had areas 1..6. First I tried just taking a bunch of simple
shapes (right triangles and the like) of the appropriate sizes and
fitting them together geometrically; so far that hasn't worked. So
then I thought maybe I could get what I wanted by taking a tetrahedron
or a disphenoid or some such and truncating a couple of the corners.

As Polya says, if you can't solve the problem, you should try solving
a simpler problem of the same sort, so I decided to see if it was
possible to take a regular tetrahedron and chop off one vertex so that
the resulting pentahedron had faces with areas 1, 2, 3, 4, 5. The
regular tetrahedron is quite tractable, geometrically, because you can
put its vertices at (0,0,0), (0,1,1), (1,0,1), and (1,1,0), and then a
plane that chops off the (0,0,0) vertex cuts the three apical edges at
points (0,a,a), (b,0,b), and
(c,c,0), for some 0 ≤ a, b, c
≤ 1. The chopped-off areas of the three faces are simply
ab√3/4, bc√3/4, ca√3/4, and the
un-chopped base has area √3/4, so if we want the three chopped faces to
have areas of 2/5, 3/5 and 4/5 times √3/4, respectively, we must have
ab = 3/5, bc = 2/5, and ca = 1/5, and we can
solve for a, b, c. (We want the new top face to have
area 1/5 · √3/4, but that will have to take care of itself,
since it is also determined by a, b, and c.)
Unfortunately, solving these equations gives b =
√6/√5, which is geometrically impossible. We might
fantasize that there might be some alternate solution, say with the
three chopped faces having areas of
1/5, 2/5 and 4/5 times √3/4, and the top face being
3/5 · √3/4 instead of
1/5 · √3/4, but none of those will work either.

Oh well, it was worth a shot. I do think it's interesting that if you
know the areas of the bottom four faces of a truncated regular
tetrahedron, that completely determines the apical face. Because you
can solve for the lengths of the truncated apical edges, as above, and
then that gives you the coordinates of the three apical vertices.

I had a brief idea about truncating a square pyramid to get the
hexahedron I wanted in the first place, but that's more difficult,
because you can't just pick the lengths of the four apical edges any
way you want; their upper endpoints must be coplanar.

The (0,a,a), (b,0,b),
(c,c,0) thing has been on my mind anyway, and I hope to
write tomorrow's blog article about it. But I've decided that my
articles are too long and too intermittent, and I'm going to try to
post some shorter, more casual ones more frequently. I recently
remembered that in the early days of the blog I made an effort to post
every day, and I think I'd like to try to resume that.

The snub disphenoid The snub
disphenoid is pictured at left. I do not know why it is called that,
and I ought to know, because I am the principal author (so far) of the
Wikipedia article on the disphenoid. Also,
I never quite figured out what "snub" means in this context, despite
perusing that section of H.S.M. Coxeter's book on polytopes at some
length. It has something to do with being halfway between what you
get when you cut all the corners off, and what you get when you cut all
the corners off again.

Anyway, earlier this week I was visiting John Batzel, who works
upstairs from me, and discovered that he had obtained a really cool
toy. It was a collection of large steel ball bearings and colored
magnetic rods, which could be assembled into various polyhedra and
trusses. This is irresistible to me. The pictures at right, taken around
2002, show me modeling a dodecahedron with less suitable
materials.

The first thing I tried to make out of John's magnetic sticks and
balls was a regular dodecahedron, because it is my favorite
polyhedron. (Isn't it everyone's?) This was unsuccessful, because it
wasn't rigid enough, and kept collapsing. It's possible that if I had
gotten the whole thing together it would have been stable, but holding
the 50 separate magnetic parts in the right place long enough to get
it together was too taxing, so I tried putting together some other
things.

A pentagonal
dipyramid worked out well, however. To understand this solid,
imagine a regular pyramid, such as the kind that entombs the pharaohs
or collects mystical energy. This sort of pyramid is known as a
square pyramid, because it has a square base, and thus four
triangular sides. Imagine that the base was instead a pentagon, so that there
were five triangular sides sides instead of only four. Then it would be a
pentagonal pyramid. Now take two such pentagonal pyramids and glue
the pentagonal bases together. You now have a pentagonal dipyramid.

The success of the pentagonal dipyramid gave me the idea that rigid
triangular lattices were the way to go with this toy, so I built an
octahedron (square dipyramid) and an icosahedron to be sure. Even the
icosahedron (thirty sticks and twelve balls) held together and
supported its own weight. So I had John bring up the Wikipedia
article about deltahedra. A
deltahedron is just a polyhedron whose faces are all equilateral
triangles.

When I was around eight, I was given a wonderful book called
Geometric Playthings, by Jean J. Pedersen and Kent
Pedersen. The book was in three sections. One section was about
Möbius strips, with which I was already familiar; I ignored this
section. The second section was about hexaflexagons, with examples to
cut out and put together.
The third section was about deltahedra, again with cutout models of
all eight deltahedra. As an eight-year-old I had cut out and proudly
displayed the eight deltahedra, so I knew that there were some
reasonably surprising models one would make with John's toy that would
be likely to hold together well. Once again, the deltahedra did not
disappoint me.

Another is the icosahedron. Imagine making a belt of 10 triangles,
alternating up and down, and then connect the ends of the belt. The result is a
shape called a pentagonal
antiprism, shown at left. The edges of the down-pointing triangles form a
pentagon on the top of the antiprism, and the edges of the up-pointing
triangles form one on the bottom. Attach a pentagonal pyramid to each
of these pentagons, and you have an icosahedron, with a total of 20
faces.

The other three deltahedra are less frequently seen. One is the
result of taking a triangular prism and appending a square pyramid to
each of its three square faces. (Wikipedia calls this a "triaugmented
triangular prism"; I don't know how standard that name is.) Since the
prism had two triangular faces to begin with, and we have added four
more to each of the three square faces of the original prism, the
total is 14 faces.

Another deltahedron is the "gyroelongated square dipyramid". You get
this by taking two square pyramids, as with the octahedron. But
instead of gluing their square bases together directly, you splice a
square antiprism in between. The two square faces of the antiprism
are not aligned; they are turned at an angle of 45° to each other,
so that when you are looking at the top pyramid face-on, you are
looking at the bottom pyramid edge-on, and this is the "gyro"
in "gyroelongated". (The icosahedron is a gyroelongated pentagonal
dipyramid.) I made one of these in John's office, but found it rather
straightforward.

The last deltahedron, however, was quite a puzzle. Wikipedia calls it
a "snub disphenoid", and as I mentioned before, the name did not help
me out at all. It took me several tries to build it correctly. It
contains 12 faces and 8 vertices. When I finally had the model I
still couldn't figure it out, and spent quite a long time rotating it
and examining it. It has a rather strange symmetry. It is front-back
and left-right symmetric. And it is almost top-bottom symmetric: If
you give it a vertical reflection, you get the same thing back, but
rotated 90° around the vertical axis.

When I planned this article I thought I understood it better.
Imaging sticking together two equilateral triangles. Call the common
edge the "rib". Fold the resulting rhombus along the rib so that the
edges go up, down, up, down in a zigzag. Let's call the resulting
shape a "wing"; it has a concave side and a convex side. Take two
wings. Orient them with the concave sides facing each other, and with
the ribs not parallel, but at right angles. So far, so good.

But this is where I started to get it wrong. The two wings have
between them eight edges, and I had imagined that you could glue a
rhombic antiprism in between them. I'm not convinced that there is
such a thing as a rhombic antiprism, but I'll have to do some arithmetic to be sure. Anyway,
supposing that there were such a thing, you could glue it in as I
said, but if you did the wings would flatten out and what you would get
would not be a proper polyhedron because the two triangles in each
wing would be coplanar, and polyhedra are not allowed to have abutting
coplanar faces. (The putative gyroelongated triangular dipyramid
fails for this reason, I believe.)

To make the snub disphenoid, you do stick eight triangles in between
the two wings, but the eight triangles do not form a rhombic
antiprism. Even supposing that such a thing exists.

I hope to have some nice renderings for you later. I have been doing
some fun work in rendering semiregular polyhedra, and I am looking
forward to discussing it here. Advance peek: suppose you know how
the vertices are connected by edges. How do you figure out where the
vertices are located in 3-space?

If you would like to investigate this, the snub disphenoid has 8
vertices, which we can call A, B, ... H. Then:

This vertex:

is connected to these:

A

B C E F H

B

A C D E

C

A B D H

D

B C E G H

E

F G A B D

F

E G H A

G

E F H D

H

F G A C D

The two wings here are ABCD and EFGH. We can distinguish three sorts
of edges: five inside the top wing, five inside the bottom wing, and
eight that go between the two wings.

Here is a list of the eight deltahedra, with links to the
corresponding Wikipedia articles:

Lightweight Database Strategies for Perl
Several years ago I got what I thought was a great idea for a
three-hour conference tutorial: lightweight data storage techniques.
When you don't have enough data to be bothered using a
high-performance database, or when your data is simple enough that you
don't want to bother with a relational database, you stick it in a
flat file and hack up some file code to read it. This is the sort of
thing that people do all the time in Perl, and I thought it
would be a big seller. I was wrong.

I don't know why. I tried giving the class a snappier title, but that
didn't help. I'm really bad at titles. Maybe people are embarrassed
to think about all the lightweight data storage hackery they do in
Perl, and feel that they "should" be using a relational database, and
don't want to commit more resources to lightweight database
techniques. Or maybe they just don't think there is very much to know
about it.

But there is a lot to know; with a little bit of technique you can postpone
the day when you need to go to an RDB, often for quite a long time,
and often forever. Many of the techniques fall into the
why-didn't-I-think-of-that category, stuff that isn't too weird to
write or maintain, but that you might not have thought to try.

I think it's a good class, but since it never sold well, I've decided
it would do more good (for me and for everyone else) if I just gave
away the materials for free.

Table of Contents

The class is in three sections. The first section is about using
plain text files and talks about a bunch of useful techniques, such as
how to do binary search on sorted text files (this is nontrivial) and
how to replace records in-place, when they might not fit.

The second section is about the Tie::File module, which
associates a flat text file with a Perl array.

The third section is about DBM files, with a comparison of the five
major implementations. It finishes up with a discussion of some of
Berkeley DB's lesser-known useful features, such as its
DB_BTREE file type, which offers fast access like a hash but
keeps the records in sorted order

Conference talk brochure descriptions
I just got back from doing some
tutorials at OSCON, which were generally well-received. Sometimes
it goes better than other times; this time it went pretty well, I
thought, except that I was seven minutes late to the Tuesday morning
one, through a tremendous series of fuckups beginning with the
conference hotel not being able to find my reservation on Saturday
night, continuing with my barely missing two unrelated streetcars on
Tuesday morning, and, let's not leave out the most important part, my
forgetting that the class started at 8:30 and not at 9:00 until about
8:00.

One of the things that Nat (the program committee chair) and I have
commiserated about in the past is that no matter how hard you try to
make a clear, concise, accurate description of the class, you are
doomed, because people do not use the descriptions in a rational way.
For example, suppose I happen to be giving the same class two years in
a row. The class title is the same both years. The 250-word
description in the brochure and on the web site is word-for-word
identical both years. Nevertheless, you can be sure that someone will
hand in an evaluation the second year that complains bitterly that the
class was a waste of time, because they took the class the year before
and there was no new material. I vented about this to Nat once, and
the look of exhausted disgust on his face was something to see.
Because I only have to read my own stupid evaluations, but Nat has to
read all the stupid evaluations, and he probably sees that same
idiotic complaint ten times a year.

Here's one I was afraid I'd get this year, and, who knows. It may yet
happen. I sent the program committee seven proposals. They accepted
three. One was for the
Advanced techniques for Parsing
class; one for for Higher-Order
Perl. There was significant overlap between these two
classes; the last third of the
Higher-Order
Perl class is about higher-order parser combinators, which are
the principal subject of the advanced parsing class.
This puts me
in a difficult position. The program committee has accepted two
classes that overlap. I have
to deliver the material that I promised in the brochure, which people
paid money to hear. I cannot unilaterally eliminate the overlap,
say by substituting a different topic into
Higher-Order
Perl, because then someone in that class might quite rightly complain that
they had been promised a section on parsing techniques, had paid for a
section on parsing techniques, but had not been delivered a section on
parsing techniques.
But some people will sign up for both classes, and then will
inevitably complain about the overlap, even though it should have been
clear from the brochure that the classes would overlap.

The only way out for me is to try to get the program committee to
agree beforehand to let me change around one of the classes to remove
the overlap, write one-third of a new class, and document the change
in the brochure description before it is published. That is a lot of
work to do in a short time. Some people write their class slides the
night before they give the class. I don't; I take weeks over it,
revising extensively, and then I give a practice session, and then I
revise again. So the classes overlapped, and I'm sure there were
complaints about it that I haven't seen yet.

My favorite complaint of all time was from the guy who took Tricks of the
Wizards and then complained that the material was too
advanced.

This year I had the opposite problem. I gave a class on Advanced
techniques for Parsing, and the following day I read a blog
article from someone who had been disappointed that it was
insufficiently advanced. This is a fair and legitimate criticism, and
deserves a reasonable response. The response is not, however, to
change the class content, because I think I have a pretty good idea of
how sophisticated the conference attendees are, and of what is useful,
and if I made the class a lot more advanced than it is, hardly anyone
would understand it. But I did feel bad that this blogger had
mistakenly wasted hours in my class and gotten nothing out of it.
That should have been avoidable.

The first thing I did was to check the
brochure description, to see if perhaps it was misleading, or if it
promised extra-advanced material that I then didn't deliver. This
sometimes happens. The deadline for proposals is far in advance of
the deadline for the class materials themselves. So what happens is
that you write up a proposal for a class you think you can do, that
people will like, and that will appeal to the program committee, and
you send it in. A few months later, it is accepted, and you start
work on the class. Then sometimes you discover that even though you
proposed a class about A, B, and C, there is only enough time to do A
and B properly, and to cover all three in a three-hour class would just
be a mess. So you write a class that covers A and B properly, and has
an abbreviated discussion of C. But then there will be some people
who came to the class specifically for the discussion of C, and who
are disappointed. It is a tough problem.

Anyway, I thought this time I had done a reasonably good job of
writing a class that actually matched the brochure description. So I
wrote to the blogger to ask how the description could have been
better: what would I have needed to say in it that would have tipped
him off that the class would not have had whatever it was he was
looking for?

The answer: nothing. He had not read the description. He attended
the class solely because of the title, Advanced techniques for
Parsing, and then after two hours figured out that it was not
as advanced as he wanted it to be.

Homosexuality is not hereditary
A just read a big pile of blog comments that all said that
homosexuality couldn't be hereditary, because if it were, natural
selection would have gotten rid of it by now.

But natural selection is more interesting than that. This article
will ignore the obvious notion of homosexuals who breed anyway. Here
is one way in which homosexuality could be entirely hereditary and
still be favored by natural selection.

Suppose that human sexuality is extremely complicated, which should
not be controversial. Suppose, just for concreteness, that there are
137 different genes that can affect whether an individual turns out
heterosexual or homosexual. Say that each of these can either be
either in state Q or state S, and that and that any individual will
turn out homosexual if any 93 of the 137 genes are in state Q,
heterosexual otherwise.

The over-simplistic argument from natural selection says that the Q
states will be bred out of the population, and that S will be
increasingly predominant over time.

Now let's consider an individual, X, whose family members tend
to carry a lot of Q genes.

Suppose X's parents have a lot of Q genes, around 87 or 90.
X's parents' siblings, who resemble them, will also have a lot
of Q genes, and have a high probability of being homosexual. Having
no children of their own, they may contribute to X's welfare,
maybe by caring for X or by finding food for X.

In short, for every gay uncle X has, that is one additional set
of cousins with whom X does not have to compete for
scarce resources.

This could well turn out to be a survival advantage for X over
someone from a family of people without a lot of Q genes, someone who
is competing for food with a passel of cousins, none of whom
ever really get enough to eat, someone whose aunt might even try to kill
them in order to benefit her own children.

Perhaps X turns out to be homosexual and never breeds, but
X probably has some siblings, in which case X might be
an advantageous gay uncle or lesbian aunt to one of his or her own
nieces or nephews, who, remember, are carrying a lot of the same
genes, including the Q genes.

It might not actually work this way, of course, and in most ways it
probably doesn't. The only point here is to show that natural
selection does not necessarily rule out the idea of inherited
homosexuality; people who think it must, have not exercised enough
imagination.

(Now that I have finished writing this article, it occurs to me that
the same argument applies to bees and ants; most individuals in a bee
or ant colony are sterile. Who would be foolish enough to argue that
this trait will soon be bred out of the colony?)

Time and time again, biologists baffled by some apparently futile or
maladroit bit of bad design in nature have eventually come to see that
they have underestimated the ingenuity, the sheer brilliance, the
depth of insight to be disovered in one of Mother Nature's creations.
Francis Crick has mischievously baptized this trend in the name of his
colleague Leslie Orgel, speaking of what he calls "Orgels Second Rule:
Evolution is cleverer than you are."

Tough questions
It's easy to recognize a good question: a good question is one that
takes a lot longer to answer than it does to ask. Chip Buchholtz's
example is "what is a byte?" To answer that you have to get into the
nitty gritty of computer architecture and how, although the
information in the computer is stored by the bit, the memory bus can
only address it by the byte.

One of the biology interns asked a me a good one a couple of weeks
ago: he asked how, if Perl runs Perl scripts, and the OS is running
Perl, what is running the OS? Now that is a tough question to
answer. I explained about logic gates, and how the logic gates are
built into trivial arithmetic and memory circuits, how these are then
built up into ALUs and memories, and how these in turn are controlled
by microcode, and finally how the logical parts are assembled into a
computer. I don't know how understandable it was, but it was the best
I could do in five minutes, and I think I got some of the idea across.
But I started and finished by saying that it was basically
miraculous.

My daughter Iris asks a ton of questions, some better than others.
On any given evening she is likely to ask "Daddy, what are you doing?"
about fifteen times, and "why?" about fifteen million times. "Why"
can be a great question, but sometimes it's not so great; Iris asks
both kinds. Sometimes it's in response to "I'm eating a sandwich."
Then the inevitable "why?" is rather annoying.

Some of the "why" questions are nearly impossible to answer. For
example, we see a lady coming up the street toward us. "Is that
Susanna?" "No." "Why is it not Susanna?"

I think what's happening here is that having discovered this magic
word that often produces interesting information, Iris is employing it
whenever possible, even when it doesn't make sense, because she hasn't
yet learned when it works and when not. Why is that not Susanna?
Hey, you never know when you might get an interesting answer. But
there might be something else going on that I don't appreciate.

But the nice thing
about Iris's incessant questions is that she listens to and remembers
the answers, ponders them deeply, and then is likely to come back with
an insightful followup when you least expect it.

This weekend we went to visit my parents in New
York, and as we drove down the Henry Hudson Parkway, we passed the
North River wastewater treatment plant. Three-year-olds are fascinated
with poop, so I took the opportunity to point out the plant to Iris.
I said that although it had a park with trees on the roof, the inside
was a giant machine for turning poop into garden soil; they cleaned it
and mixed with with wood chips and it composted like the stuff in our
composter. (I later found that some of these details were not quite
accurate, but the general idea is correct. See the official
site for the official story. My wife provided the helpful analogy
with the composter.) As I expected, Iris was interested, and thought
this over; she confirmed that they turned poop into soil, and then
asked what they made pee into. I was not prepared for that one, and I
had to promise her I would find out later. It took me some Internet
research time to find out about denitrogenation.

Speaking of poop, last month Iris asked a puzzler: why don't birds use
toilets? I think this was motivated by our earlier discussion of
bird poop on our car.

In Make Way for Ducklings there's a picture of the
friendly policeman Michael, running back to his police box to order a
police escort to help the ducklings across Beacon Street. He's
holding his billy club. Iris asked what that was for. I thought a
moment, and then said "It's for hitting people with." Later I
wondered if I had given an inaccurate or incomplete answer, so I asked
around, and did some reading. It appears I got that one right.
Some folks I know suggested that I
should have said it was for hitting bad people, but I'd rather
stick to the plain facts, and leave out the editorializing.

Anyway, lately I've been rereading The Defeat of the Spanish
Armada, by Garrett Mattingly, which is a really good book; it
won a special Pulitzer Prize when it was published.
It's about the attempt by Spain to invade England in 1588. The
invasion was a failure, and the Spanish got clobbered.
Most interesting minor detail: Francis Drake went to St. Vincent
the year before the Armada sailed and captured a bunch of merchant
ships that were carrying seasoned barrel-staves, which he burnt. As a
result, when the mighty Armada sailed, many of the ships had to carry
casks made of green wood, and they leaked; whenever the Spanish opened
a cask that should have contained food or water, they were as likely
as not to find it full of green slime instead.

So I was reading the Mattingly book this evening, and Iris was eating
and playing with Play-Doh on the kitchen floor. After the eleventh
repetition of "Daddy, what are you doing?" "Reading." I decided to
tell Iris what I was reading about. I said that I was reading about
ships, that ships are big boats; they carry lots of men and guns.
Iris asked why they carried guns, and I explained that often the ships
carried treasure, like spices or gold or jewels or cloth, and that
pirates tried to steal it. Iris asked if the cloth was like a wash
cloth, and I said no, it was more like the kind of cloth that Mommy
makes quilts from, or like the silk that her silk dress is made of. I
explained about the pirates, which she seemed to understand, because
toddlers know all about people who try to take stuff that isn't
theirs. And then she asked the question I couldn't answer: Why were
there men on the ships, but no women?

I was totally stumped; I don't even know where to begin explaining to
a three-year-old why there are no women on ships in 1588. The only
answers I could think of had to do with women's traditional roles,
with European mores, social constructions of gender, and so on, all
stuff that wouldn't help. Sometimes women were smuggled aboard ship,
but I wasn't going to say that either.

I don't usually give up, but this time I gave up. This is a tough
question of the first order, easy to ask, hard to answer. It's a lot
easier to explain wastewater treatment.

Chromatic says that these arguments are bunk because programming
language syntax is much less important than programming language
semantics. But I think that is straining at a gnat and swallowing a
camel.

To argue that a certain programming language feature is bad because it
is confusing to beginners, you have to do two things. You have to
successfully argue that being confusing to beginners is an important
metric. Chromatic's article tries to refute this, saying that it is
not an important metric.

But before you even get to that stage, you first have to show that
the programming language feature actually is confusing to
beginners.

But these arguments are never presented with any evidence at all,
because no such evidence exists. They are complete
fabrications, pulled out of the asses of their propounders, and made
of equal parts wishful thinking and bullshit.

Addendum 20070720:

To support my assertion that nobody knows what makes programming hard
for beginners, I wanted to cite this paper, The
camel has two humps, by Dehnadi and Bornat, which I was
rereading recently, but I couldn't find my copy and couldn't remember
the title or authors. Happily, I eventually remembered.

The abstract begins:

Learning to program is notoriously difficult. A substantial minority
of students fails in every introductory programming course in every UK
university. Despite heroic academic effort, the proportion has
increased rather than decreased over the years. Despite a great deal
of research into teaching methods and student responses, we have no
idea of the cause.

But the situation isn't completely hopeless; the abstract also
says:

We have found a test for programming aptitude, of which we give details. We can predict success
or failure even before students have had any contact with any programming language with very
high accuracy, and by testing with the same instrument after a few weeks of exposure, with
extreme accuracy. We present experimental evidence to support our claim.
certain to succeed.

I picked a few example functions, some of which worked and some of
which didn't.

One glaring omission from the article was that I forgot to mention the
so-called "Babylonian method" for calculating square roots. The
Babylonian method for calculating √n is simply to iterate
the function x → ½(x + n/x).
(This is a special case of the Newton-Raphson method for finding the
zeroes of a function. In this case the function whose zeroes are being
found is is x → x2 - n.) The
Babylonian method converges quickly for almost all initial values of
x. As I was writing the article, at 3 AM, I had the nagging
feeling that I was leaving out an important example function, and then
later on realized what it was. Oops.

But there's a happy outcome, which is that the Babylonian method
points the way to a nice general extension of this general technique.
Suppose you've found a function f that has your target value,
say √2, as a fixed point, but you find that iterating f doesn't
work for some reason. For example, one of the functions I considered
in the article was x → 2/x. No matter what initial
value you start with (other than √2 and -√2) iterating the function
gets you nowhere; the values just hop back and forth between x
and 2/x forever.

But as I said in the original article, functions that have √2 as a
fixed point are easy to find. Suppose we have such a function,
f, which is badly-behaved because the fixed point repels, or
because of the hopping-back-and-forth problem. Then we can perturb the
function by trying instead x → ½(x +
f(x)), which has the same fixed points, but which might
be better-behaved. (More generally, x → (ax
+ bf(x)) / (a + b) has the same fixed
points as f for any nonzero a and
b, but in this article we'll leave a = b = 1.)
Applying this transformation to the function x → 2/x
gives us the the Babylonian method.

I tried applying this transform to the other example I used in the
original article, which was x → x2 + x - 2.
This has √2 as a fixed point, but the √2 is a repelling fixed point.
√2 ± &epsilon → √2 ± (1 + 2√2)ε, so the error gets bigger
instead of smaller. I hoped that perturbing this function might
improve its behavior, and at first it seemed that it didn't. The
transformed version is x → ½(x + x2 +
x - 2) = x2/2 + x - 1. That comes to pretty
much the same thing. It takes √2 ± &epsilon → √2 + (1 +
√2)ε, which has the same problem. So that didn't work; oh
well.

But actually things had improved a bit. The original function also
has -√2 as a fixed point, and again it's one that repels from both
sides, because -√2 ± ε → -√2 ± (1 - 2√2)ε, and |1
- 2√2| > 1. But the transformed function, unlike the original, has
-√2 as an attractor, since it takes -√2 ± ε → -√2 ± (1 -
√2)ε and |1 - √2| < 1.

So the perturbed function works for calculating √2, in a slightly
backwards way; you pick a value close to -√2 and iterate the function,
and the iterated values get increasingly close to -√2. Or you can get
rid of the minus signs entirely by transforming the function again,
and considering -f(-x) instead of f(x).
This turns
x2/2 + x - 1
into
-x2/2 + x + 1.
The fixed points change places, so now
√2 is the attractor, and -√2 is the repeller,
since √2 ± ε → √2 ± (1 - √2)ε.
Starting with x =
1, we get:

1.5

1.375

1.4296875

1.40768433

1.41689675

1.41309855

1.41467479

1.41402241

1.41429272

1.41418077

1.41422714

1.41420794

1.41421589

1.41421260

1.41421396

1.41421340

1.41421363

So that worked out pretty well. One might even make the argument that
the method is simpler than the Babylonian method, since the division
is a simple x/2 instead of a complex 2/x. I have not
yet looked into the convergence properties; I expect it will turn out
that the iterated polynomial converges more slowly than the Babylonian method.

I had meant to write about Möbius transformations, but that will
have to wait until next week, I think.

Evaporation
I work for the Penn Genomics Institute, mostly doing software work,
but the Institute is run by biologists and also does biology projects.
Last month I taught some perl classes for the four summer interns;
this month they are doing some lab work. Since part of my job
involves dealing with biologists, I thought this would be a good
opportunity to get into the lab, and I got permission from Adam, the
research scientist who was supervising the interns, to let me come
along.

Since my knowledge of biology is practically nil, Adam was not
entirely sure what to do with me while the interns prepared to grow
yeasts or whatever it is that they are doing. He set me up with a
scale, a set of pipettes, and a beaker of water, with instructions to
practice pipetting the water from the beaker onto the scale.

The pipettes came in three sizes. Shown at right is the largest of
the ones I used; it can dispense liquid in quantities between 10 and
100 μl, with a precision of 0.1 μl. I used each of the three pipettes
in three settings, pipetting water in quantities ranging from 1 ml
down to 5 μl. I think the idea here is that I would be able to see if
I was doing it right by watching the weight change on the scale, which
had a display precision of 1 mg. If I pipette 20 μl of water onto the
scale, the measured weight should go up by just about 20 mg.

Sometimes it didn't. For a while my technique was bad, and I didn't
always pick up the exact right amount of water. With the small
pipette, which had a capacity range of 2–20 μl, you have to suck
up the water slowly and carefully, or the pipette tip gets air bubbles
in it, and does not pick up the full amount.

With a scale that measures in milligrams, you have a wait around
for a while for the scale to settle down after you drop a few μl of
water onto it, because the water bounces up and down and the last
digit of the scale readout oscillates a bit. Milligrams are much
smaller than I had realized.

It turned out
that it was pretty much impossible to see if I was picking up the full
amount with the smallest pipette. After measuring out some water, I
would wait a few seconds for the scale display to stabilize. But if I
waited a little longer, it would tick down by a milligram. After
another twenty or thirty seconds it would tick down by another
milligram. This would continue indefinitely.

I thought about this quietly for a while, and realized that what I was
seeing was the water evaporating from the scale pan. The water I had
in the scale pan had a very small surface area, only a few square
centimeters. But it was evaporating at a measurable rate, around 2 or
3 milligrams per minute.

So it was essentially impossible to measure out five pipette-fuls of 10
μl of water each and end up with 50 mg of water on the scale. By the
time I got it done, around 15% of it would have evaporated.

The temperature here was around 27°C, with about 35% relative
humidity. So nothing out of the ordinary.

I am used to the idea that if I leave a glass of water
on the kitchen counter overnight, it will all be gone in the morning;
this was amply demonstrated to me in nursery school when I was about
three years old. But to actually see it happening as I watched was a
new experience.

Top of Anil's list: the Statue of Liberty. He advises taking the
Staten Island Ferry instead. I couldn't agree more. The Statue is
great, but it's just as great seen from a distance, and you get a
superb view of it from the Ferry. The Ferry is cheap (Anil says it's
free; it was fifty cents last time I took it) and the view of lower
Manhattan is unbeatable.

Similarly, you should avoid the Circle Line, which is a boat trip all
the way around Manhattan Island. That sounds good, but it takes all
day and you spend a lot of it cruising the not-so-scenic Harlem River.
The high point of the trip is the view of lower Manhattan and the
harbor. You can get the best parts of the Circle Line trip by taking
the Staten Island Ferry, which is much cheaper and omits the dull
bits.

Ten years ago I would have said to skip the World Trade Center in favor
of the Empire State Building. Well, so much for that suggestion.

Anil says to skip Katz's and the Carnegie Deli, that they're tourist
traps. I've never been to Katz's. I would
not have advised skipping the Carnegie. I have not been there
since 1995, so my view may be out of date, and the place may have
changed. But in 1995 I would have said that although it is indeed a
tourist trap, the pastrami sandwich is superb nevertheless. At no
time, however, would I have advised anyone to eat anything else from
there. Get the sandwich and eat it in the comfort of your hotel room, perhaps.
But quickly, before it gets cold.

Also in the "go there but only eat one thing" department is Junior's
Restaurant, at (I think) Atlantic and De Kalb avenues in Brooklyn.
Now here's the thing about Junior's: their cheesecake is justly
famous. They guarantee it. It is not your usual guarantee. A
typical guarantee would be that if you are not happy with the
cheesecake, they will refund your money. That is not Junior's
guarantee. No. Junior's guarantees your money back unless their
cheesecake is the best you have ever eaten.

Lorrie and I once ordered a cheesecake from Junior's. They ship it
overnight, packed in dry ice. Our order was delayed in transit; we
called the next day to ask where it was. They apologized and
immediately overnighted us a second cheesecake, free, with no further
discussion. The next day the two cheesecakes arrived in the mail.
Both of them were the best cheesecake I have ever eaten.

But I once went to have dinner at Junior's. This was a mistake.
Their cheesecake is so stupendous, I thought, how could their other
food possibly fail? As usual, the cheesecake was the best I have ever
eaten. But dinner? Not so hot. Do go to Junior's. You don't even
have to schlep out to Atlantic Avenue, since they have opened
restaurants in Times Square and at Grand Central Station. Get the
cheesecake. But eat dinner somewhere else.

Anil says not to eat in the goddamn Olive Garden, and of course he is
right. What on earth is the point of going to New York, food capital
of this half of the Earth, and eating in the goddamn Olive Garden?
You could have done that in Dubuque or Tallahassee or whatever crappy
Olive-Garden-loving burg you came from.

If you don't know where to eat in New York, here's my advice: Take the
subway to 42nd street, get out, and walk to 9th Avenue. Choose a
side of the street by coin flip. Walk north on 9th avenue. Make a
note of every interesting-seeming restaurant you pass. After three
blocks, you will have passed at least ten interesting-seeming
restaurants. Walk back to the most interesting-seeming one and go in,
or select one at random. I promise you will have a win, probably a
big win. That stretch of 9th Avenue is a paradise of inexpensive but
superb restaurants.

I have played the 9th Avenue game many times and it has never
failed.

Speaking of "things to skip", I suggest skipping the giant Times
Square New Year's Eve celebration, unless you are a pickpocket, in
which case you should get there early. Instead, have dinner on 9th
Avenue. As you pass each cross-street walking down 9th Avenue, you
will be able to see the Times Square crowd two blocks east, and you
can pause a moment to think how clever you are to not to be part of
it; feeling smugly superior to the writhing mass of humanity is an
authentically New York experience. Then have an awesome dinner on 9th
Avenue, and take the subway home.

Anil's
whole series is pretty good, and as a native New Yorker I found
little to disagree with. But I think he may be a little misleading
when he says "the natives are friendly and helpful." I would say not.
Neither are they unfriendly or unhelpful. What they mostly are, in my
experience, is brusque and in a hurry. They will not go out of their
way to abuse, harass, or ridicule you; nor will they go out of their
way to advise or assist you. The New Yorkers' outlook on the world is
that they have important business to attend to, and so, presumably, do
you, and everything will run smoothly as long as everyone just stays
out of each others' way and attends to their own important
business.

In Boston, people will take you personally. I was once thrown out of
a liquor store in Boston for daring to ask for a bottle of rye in a
manner that the proprietor found offensive. This
would never happen in New York. New Yorkers don't have time to be
offended by your stupid demands, and they will not throw you out,
because they want your money, and if dealing with your stupid
demands is what they have to do to get it, well, they will just deal
with your stupid demands as quickly as possible. A New York liquor
store owner is not in the business of getting offended, and he has
more important things to do than to throw you out. He is in the
business of taking your money, and if he throws you out, it is because
you are getting in the way of his next customer and preventing him
from taking his money. Most likely, if you ask for rye, the
New York liquor store owner will take your money and give you the
rye.

There is a story about Hitler and Goebbels having an argument, with
Hitler arguing that the Jews were too inferior to pose any sort of
threat, and Goebbels disputing with him, saying that Jews are devious
and cunning. To prove his point, Goebbels takes Hitler to a
Jewish-run hardware and sundries store and asks the proprietor for a
left-handed teapot. The proprietor hesitates a moment, says "let me
check in the back room," and returns carrying a
teapot in his left hand. "Yes," he says, "I had just one left."
As Goebbels and Hitler leave the shop with
their left-handed teapot, Goebbels says "I told you the Jews were
cunning." Hitler replies "What's so cunning about having one
left?"

A Bostonian would have told those two assholes where they could stick
their left-handed teapot. That Jew emigrated from Germany, and he did
not go to Boston. He went to New York, as did his fifty devious
cousins.

But I digress.

In some cities I have visited, there is no convention about which side
of the subway stairs are for going up and which are for going down.
People just go up whichever side they feel like. In New York, you
always travel on the right-hand side of the stairs. Everyone
does this, because everyone knows that if they don't they will just
get in the way and hold everyone up, including themselves. They have
no time for this disorganized nonsense in which people go up whatever
side of the stairs suits them.

New Yorkers do not stop and stand in doorways. When New Yorkers need
to open their umbrellas, they step aside, and do it out of the
way.

New Yorkers are orderly queuers. Disorganized queuing just wastes
everyone's time. You don't want to waste everyone's time, do you? So
get in line and shut the hell up!

Here in Philadelphia, we waste a lot of time trying to flag down cabs
that turn out to be full. New Yorkers would never tolerate such slack
management. In New York, taxicabs have a lamp on top that is wired to
the taximeter; it lights up when the taxi is empty. That is good
business for drivers, for riders, for everyone. I like Philadelphia
well enough to have lived here for seventeen years, but it's no New
York, let me tell you.

Hong Kong, on the other hand, is a very satisfactory New York. A few
years back I visited Hong Kong, food capital of the other half
of the Earth, on business, and loved it there. Not least because of
the food. The Cantonese are the best cooks in the world, cooks so
gifted and brilliant that people all over the world line up on the
weekends to eat Cantonese-style garbage, and then come back next
weekend to eat it again, because Cantonese garbage, which they call
dim sum, but if you think about it for a minute you will realize that
dim sum is the week's leftovers, served up in a not-too-subtle
disguise, dim sum is more delicious than other cuisines' delicacies.
And Hong Kong has the best Cantonese food in the world.

People had warned me beforehand that the Hongkongians were known for
being brusque and rude. And that is what I found. Several times in
Hong Kong I called up someone or other to try to get something done,
and the conversation went roughly like this: I would start my detailed
explanation of what I wanted, and why, and the person on the other
end of the phone would cut me off mid-sentence, saying something like
"You need x; I do y. OK? OK! <click>" and that
was the end of it.

As a New Yorker, I recognized immediately what was going on. Brusque,
yes, but not rude. I knew that the person on the other end of the
phone was thinking that their time was valuable, that I presumably
considered my own time valuable, and that we would both be best served
if each of us wasted as little of our valuable time as possible in
idle chitchat. New Yorkers are just like that too. I gather some
people are offended by this behavior, and want the person on the phone
to be polite and friendly. I just want them to shut up and do the
thing I want done, and in Hong Kong that is what I got.

So if you are a tourist in New York, please try to remember: New
Yorkers may appear to be trying to get rid of you as quickly as they
can, and if it seems that way, it is probably because they are trying
to get rid of you as quickly as they can. But they are doing it
because they are trying to help, because they have your best interests
at heart. And also because they want to get rid of you as quickly as
they can.

I got tired of writing awk '{print $11}' when I wanted to
extract the 11th field of some stream of data in a Unix pipeline,
which is something I do about six thousand times a day. So
I wrote this tiny thing. It was probably the most useful piece of
software I wrote in that calendar year, and as you can see from the
length, it certainly had the best cost-to-benefit ratio. I use it
every day.

The point here is that you can replace awk '{print $11}'
with just f 11. For example, f 11
access_log finds out the referrer URLs from my Apache httpd log.
I also frequently use f -1, which
prints the last field in each line.
ls -l | grep '^l' | f -1 prints out the targets of all the
symbolic links in the current directory.

Programs like this won't win me any prizes, but they certainly are
useful.

Anyway, today's post was inspired by another similarly tiny utility
that I expect will be similarly useful that I just finished. It's
called runN:

You can tell I just finished it because the shuffle() and
usage() functions are unimplemented.

The idea is that you execute the program like this:

runN -n 3 -c foo arg1 arg2 arg3 arg4...

and it runs the commands foo arg1, foo arg2,
foo arg3, foo arg4, etc., simultaneously, but with
no more than 3 running at a time.

The -n option says how many commands to run simultaneously;
after running that many the main control waits until one has exited
before starting another.

If I had implemented shuffle(), then -r would run
the commands in random order, instead of in the order specified.
Probably I should get rid of -c and just have the program
take the first argument as the command name, so that the invocation
above would become runN -n 3 foo arg1 arg2 arg3 arg4.... The
-v flag, had I implemented it, would put the program into
verbose mode.

I find that it's best to defer the implementation of features like
-r and -v until I actually need them, which might be
never. In the past I've done post-analyses of the contents of
~mjd/bin, and what I found was that my tendency was to
implement a lot more features than I needed or used.

In the original implementation, the -n is mandatory, because
I couldn't immediately think of a reasonable default. The only
obvious choice is 1, but since the point of the program was to run
programs concurrently, 1 is not reasonable. But
it occurs to me now that if I let -n default to 1, then this
command would replace many of my current invocations of:

for i in ...; do
cmd $i
done

which I do quite a lot. Typing runN cmd ... would be a lot
quicker and easier. As I've
written before, when a feature you put in turns out to have
unanticipated uses, it's a sign of a good, modular design.

The code itself makes me happy for two reasons. One is that the
program worked properly on the first try, which does not happen very
often for me. When I was in elementary school, my teachers always
complained that although I was very bright, I made a lot of careless
mistakes because I was not methodical enough. They tried hard to fix
this personality flaw. They did not succeed.

The other thing I like about the code is that it's so very brief. Not
to say that it is any briefer than it should be; I think it's just
about perfect. One of the recurring themes of my study of programming
for the last few years is that beginner programmers use way more code
than is necessary, just like beginning writers use way too many words.
The process and concurrency management turned out to be a lot
easier than I thought they would be: the default Unix behavior was
just exactly what I needed. I am particularly pleased with delete
$pid{wait()}. Sometimes these things just come together.

The 1 while wait() >= 0 line is a non-obfuscated version
of something I wrote in my prize-winning obfuscated
program, of all places. Sometimes the line between the sublime
and the ridiculous is very fine indeed.

Despite my wariness of adding unnecessary features, there is at least one
that I will put in before I deploy this to ~mjd/bin and start
using it. I'll implement usage(), since experience has shown
that I tend to forget how to invoke these things, and reading the
usage message is a quicker way to figure it out than is rereading the
source code. In the past, usage messages have been good
investments.

I'm tempted to replace the cut-rate use of split here with
something more robust. The problem I foresee is that I might want to
run a command with an argument that contains a space. Consider:

runN -n 2 -c ls foo bar "-l baz"

This runs ls foo, then
ls bar, then
ls -l baz. Without the split() or something like
it, the third command would be equivalent to
ls "-l baz" and would fail with something like -l baz: no
such file or directory. (Actually it tries to interpret the
space as an option flag, and fails for that reason instead.) So I put the
split in to enable this usage. (Maybe this was a
you-ain't-gonna-need-it moment; I'm not sure.) But this design makes
it difficult or impossible to apply the command to an argument with a
space in it. Suppose I'm trying to do ls on three
directories, one of which is called old stuff. The natural
thing to try is:

runN -n 2 -c ls foo bar "old stuff"

But the third command turns into ls old stuff and produces:

ls: old: No such file or directory
ls: stuff: No such file or directory

If the split() were omitted, it would just work, but then the
ls -l baz example above would fail.
If the split() were replaced by the correct logic, I
would be able to get what I wanted by writing something like this:

runN -n 2 -c ls foo bar "'old stuff'"

But as it is this just produces another error:

ls: 'old: No such file or directory
ls: stuff': No such file or directory

Perl comes standard with a library called ShellWords that is
probably close to what I want here. I didn't use it because I wasn't
sure I'd actually need it—only time will tell—and because
shell parsing is very complicated and error-prone, more so when it is
done synthetically rather than by the shell, and even more so when it
is done multiple times; you end up with horrible monstrosities like
this:

So my fear was that by introducing a double set of shell-like
interpretation, I'd be opening a horrible can of escape character
worms and weird errors, and my hope was that if I ignored the issue
the problems might be simpler, and might never arise in practice.
We'll see.

If you carry this out, you get pairs p and q that have
p2 - 2q2 = ±1, which means that
p/q ≈ √2. The farther you carry the
recurrence, the better the approximation is.

I said that this formula
comes from consideration of continued fractions.
But I was thinking about it a little more, and I realized that there
is a way to get such a recurrence for pretty much any algebraic
constant you want.

Consider for a while the squaring function s : x →
x2. This function has two obvious fixed points, namely 0 and
1, by which I mean that s(0) = 0 and s(1) = 1. Actually
it has a third fixed point, ∞.

If you consider the behavior on some x in the interval
(0, 1), you see that s(x) is also in the same
interval. But also, s(x) < x on this
interval. Now consider what happens when you iterate s on this
interval, calculating the sequence s(x),
s(s(x)), and so on. The values must stay in
(0, 1), but must always decrease, so that no matter what x
you start with, the sequence converges to 0. We say that 0 is an
"attracting" fixed point of s, because any starting value
x, no matter how far from 0 it is (as long as it's still in (0, 1)), will eventually be attracted
to 0. Similarly, 1 is a "repelling" fixed point, because any starting
value of x, no matter how close to 1, will be repelled to 0.

Consideration of the interval (1, ∞) is similar. 1 is a
repeller and ∞ is an attractor.

Fixed points are not always attractors or repellers. The function
x → 1/x has fixed points at ±1, but these
points are neither attractors nor repellers.

Also, a fixed point might attract from one side and repel from the
other. Consider x → x/(x+1). This has
a fixed point at 0. It maps the interval (0, ∞) onto
(0, 1), which is a contraction, so that 0 attracts values on
the right. On the other hand, 0 repels values on the left, because
1/-n goes to 1/(-n+1). -1/4 goes to -1/3 goes to -1/2
goes to -1, at which point the whole thing blows up and goes to
-∞.

The idea about the fixed point attractors is suggestive.
Suppose we were to pick a function f that had √2 as a
fixed point. Then √2 might be an attractor, in which case
iterating f will get us increasingly accurate approximations to
√2.

So we want to find some function f such that f(√2) =
√2. Such functions are very easy to find! For example, take √2.
square it, and divide by 2, and add 1, and take the square root, and
you have √2 again. So x → √(1+x2/2) is such
a function. Or take √2. Take the reciprocal, double it, and you
have √2 again. So x → 2/x is another such
function. Or take √2. Add 1 and take the reciprocal. Then add 1
again, and you are back to √2. So x → 1 + 1/(x+1)
is a function with √2 as a fixed point.

Or we could look for functions of the form ax2 + bx +
c. Suppose √2 were a fixed point of this function. Then we
would have 2a + b√2 + c = √2. We would like
a, b, and c to be simple, since the whole point
of this exercise is to calculate √2 easily. So let's take
a=b=1, c=-2. The function is now x →
x2 + x - 2.

Which one to pick? It's an embarrasment of riches.

Let's start with the polynomial,
x →
x2 + x - 2. Well, unfortunately this is the wrong
choice. √2 is a fixed point of this function, but repels on both
sides: √2 ± ε → √2 ± ε(1 + 2√2),
which is getting farther away.

The inverse function of
x →
x2 + x - 2 will have √2 as an attractor on both sides,
but it is not so convenient to deal with because it involves taking
square roots. Still, it does work; if you iterate
½(-1 + √(9 + 4x)) you do get √2.

Of the example functions I came up with, x → 2/x is
pretty simple too, but again the fixed points are not attractors.
Iterating the function for any initial value other than the fixed
points just gets you in a cycle of length 2, bouncing from one side of
√2 to the other forever, and not getting any closer.

But the next function, x → 1 + 1/(x+1), is a
winner. (0, ∞) is crushed into (1, 2), with √2 as the fixed
point, so √2 attracts from both sides.

Writing x as a/b, the function becomes
a/b → 1 + 1/(a/b+1), or, simplifying,
a/b → (a + 2b) / (a +
b). This is exactly the recurrence I gave at the beginning of
the article.

We did get a little lucky, since the fixed point of interest, √2, was
the attractor, and the other one, -√2, was the repeller.
((-∞, -1) is mapped
onto (-∞, 1), with -√2 as the fixed point; -√2 repels on
both sides.)
But had it
been the other way around we could have exchanged the behaviors of the
two fixed points by considering -f(-x) instead.
Another way to fix it is to change the attractive behavior into repelling
behavior and vice versa by running the function backwards. When we
tried this for
x → x2 + x - 2 it was a pain because of the
square roots. But the
inverse of
x → 1 + 1/(x+1) is simply
x → (-x + 2) / (x - 1), which is
no harder to deal with.

The continued fraction stuff can come out of the recurrence, instead
of the other way around. Let's iterate the function
x → 1 + 1/(1+x) formally,
repeatedly replacing x with
1 + 1/(1+x). We get:

1 + 1/(1+x)
1 + 1/(1+1 + 1/(1+x))
1 + 1/(1+1 + 1/(1+1 + 1/(1+x)))
...

So we might expect the fixed point, if there is one, to be 1 + 1/(2 +
1/(2 + 1/(2 + ...))), if this makes sense. Not all such expressions
do make sense, but this one is a continued fraction, and continued
fractions always make sense. This one is eventually periodic,
and a theorem says that such continued fractions always have values
that are quadratic surds. The value of this one happens to be √2. I
hope you are not too surprised.

In the course of figuring all this out over the last two weeks or so,
I investigated many fascinating sidetracks. The
x → 1 + 1/(x+1) function is an example of a
"Möbius transformation", which has a number of interesing properties
that I will probably write about next month. Here's a foretaste: a
Möbius transformation is simply a function x → (ax
+ b) / (cx + d) for some constants a,
b, c, and d. If we agree to abbreviate this
function as !!{ a\, b \choose c\,d}!!, then the inverse function is also a Möbius
transformation, and is in fact
!!{a\, b\choose
c\,d}^{-1}!!.

Do you dream in color?
People have occasionally asked me whether I dreamt in color or on
black-and-white, by which I suppose they meant grayscale. This
question was strange to me the first time I heard it, because up to
then it had not occurred to me that anyone did not dream in color. I
still find it strange, and I had to do a Google search to verify that
there really are people who claim not to dream in color.

One time, when I replied that I did dream in color, my interlocutor
asked me if I was sure: perhaps I dreamt in black and white, but only
remembered it as being in color later.

I am sure I dream in color, because on more than one occasion I have
had discussions in dreams about colors of objects. I can't remember
any examples right now, but it was something like this: "Give me the
red apple." "Okay, here." "That is not the red apple, that is the
green apple!" And then I looked and saw that the apple I had thought
was red was really green.

One could still argue that I wasn't really dreaming in color,
that it only seemed like that, or something. It's a delicate
philosophical point. One could also argue that I didn't have any
dream at all, I only thought I did after I woke up. I suppose the
only refutations of such an argument either appeal to neurology or
involve a swift kick in the pants.

And then suppose I have a dream in which I take LSD and have marvelous
hallucinations. Did I really have hallucinations? Or did I only
dream them? If I dream that I kill someone, we agree that it wasn't
real, that a dream murder is not a real murder; it is only in your
head. But hallucinations, by definition, are only in your head even
when they are real, so don't dream hallucinations have as much claim
to reality as waking hallucinations?

One might argue that dreamt LSD hallucinations are likely to be
qualitatively very different from real LSD hallucinations—less
like real LSD hallucinations, say, and more like, well, dreams. But
this only refutes the claim that the dream hallucinations were
LSD hallucinations. And nobody was going to claim that they
were LSD hallucinations anyway, since no actual LSD was
involved. So this doesn't address the right question.

Stickier versions of the same problem are possible. For example,
suppose I give Bill a little piece of paper and tell him it is
impregnated with LSD. It is not, but because of the placebo effect,
Bill believes himself to be having an LSD trip and reports
hallucinations. There was no LSD involved, so the hallucinations were
only imaginary. But even real hallucinations are only imaginary. Are
we really justified in saying that Bill is mistaken, that he did not
actually hallucinate, but only imagined that he did? That seems like
a very difficult position to defend.

I seem to have wandered from the main point, which is that I had
another dream last night that supports my contention that I dream in
color. I was showing my friend Peter some little homunculi that had
been made long ago from colored pipe cleaners, shiny paper,
and sequins by my grandmother's friend Kay Seiler. Originally there
had been ten of these, but in the dream I had only five. When my
grandmother had died, my sister and I had split the set, taking five
each. In place of the five originals I was missing, I had five
copies, which were identifiable as such because they were in
grayscale. Presumably my sister had grayscale copies of the originals
I retained. I explained this to Peter, drawing his attention to the
five full-color homunculi and the five grayscale ones.

So yes, barring philosophical arguments that I think deserve a kick in
the pants, I am sure that I dream in color.

I no longer remember how I solved the problem the first time around,
but I was tinkering around with it today and came up with an approach
that I think is instructive, or at least interesting.

We want to find non-negative integers a and b such that
½(a2 + a) = b2. Or, equivalently, we
want a and b such that √(a2 + a) =
b√2.

Now, √(a2 + a) is pretty nearly a + ½.
So suppose we could find p and q with a + ½ =
b·p/q, and p/q a bit larger
than √2. a + ½ is a bit too large to be what we want on the
left, but p/q is a bit larger than what we want on the
right too. Perhaps the fudging on both sides would match up, and we
would get √(a2 + a) = b√2 anyway.

If this magic were somehow to occur, then a and b would
be the numbers we wanted.

Finding p/q that is a shade over √2 is a well-studied
problem, and one of the things I have in my toolbox, because it seems
to come up over and over in the solution of other problems, such as
this one. It has interesting connections to several other parts of
mathematics, and I have written
about it here before.

The theoretical part of finding p/q close to √2 is some
thing about continued fractions that I don't want to get into today.
But the practical part is very simple. The following recurrence
generates all the best rational approximations to √2; the farther you
carry it, the better the approximation:

p0 = 1

q0 = 1

pi+1 = pi + 2qi

qi+1 = pi + qi

This gives us the following examples:

p

q

p/q

1

1

1.0

3

2

1.5

7

5

1.4

17

12

1.416666666666667

41

29

1.413793103448276

99

70

1.414285714285714

239

169

1.414201183431953

577

408

1.41421568627451

1393

985

1.414213197969543

3363

2378

1.41421362489487

And in all cases p2 - 2q2 = ±1.

Now, we want a + ½ = b·p/q, or
equivalently (2a + 1)/2b = p/q. This
means we can restrict our attention to the rows of the table that have
q even. This is a good thing, because we need
p/q a bit larger than √2, and those are precisely the
rows with even q. The rows that have q odd have
p/q a bit smaller than √2, which is not what we
need. So everything is falling into place.

Let's throw away the rows with q odd, put a = (p
- 1)/2 and b = q/2, and see what we get:

p

q

a

b

½(a2+a) = b2

3

2

1

1

1

17

12

8

6

36

99

70

49

35

1225

577

408

288

204

41616

3363

2378

1681

1189

1413721

Lo and behold, our wishful thinking about the fudging on both sides
canceling out has come true, and an infinite set of solutions just
pops right out.

I have two points to make about this. One is that I have complained in the past about
mathematical pedagogy, how the convention is to come up with some
magic-seeming guess ahead of time, as when pulling a rabbit from a
hat, and then at the end it is revealed to be the right choice, but
what really happened was that the author worked out the whole
thing, then saw at the end what he would need at the beginning to make
it all work, and went back and filled in the details.

That
is not what happened here. My apparent luck was real luck. I
really didn't know how it was going to come out. I was really just
exploring, trying to see if I could get some insight into the answer
without necessarily getting all the way there; I thought I might
need to go back and do a more careful analysis of the fudge factors,
or something. But sometimes when you go exploring you stumble on the
destination by accident, and that is what happened this time.

The other point I want to make is that I've written before about how
a mixture of equal parts of numerical sloppiness and algebraic
tinkering, with a dash of canned theory, can produce useful results,
in a sort of alchemical transmutation that turns base metals into
gold, or at least silver. Here we see it happen again.

Trollope's book begins with her arrival from Europe in New Orleans. I
was drawn in early on by the following passage, which appears on page
5:

The land is defended from the encroachments of the river by a high
embankment which is called the Levée; without which the
dwellings would speedily disappear, as the river is evidently higher
than the banks would be without it. . . . She was looking so mighty,
and so unsubdued all the time, that I could not help fancying she
would some day take the matter into her own hands again, and if so,
farewell to New Orleans.

Harry Potter and the Goblet of Fire
I have not been too impressed with the Harry Potter
books. I read them all, one at a time, on airplanes. They are good
for this because they are fat, undemanding, and readily available in
airport bookshops for reasonable prices. In a lot of ways they are
badly constructed, but there is really no point in dwelling on their
flaws. The Potter books have been widely criticized already from all
directions, and so what? People keep buying them.

But The Goblet of Fire has been bothering me for years
now, because its plot is so very stupid.
I am complaining about it here in my blog
because it continues to annoy me, and I hope to forget about it after
I write this. The rest of this article will contain extensive
spoilers, and I will assume that you either know it all already or
that you don't care.

The bad guys want to kill Harry Potter, the protagonist. The
Triwizard Tournament is being held at Harry's school. In the
tournament, the school champions must overcome several trials, the
last of which is to race through a maze and grab the enchanted goblet
at the center of the maze. The bad guys' plan is this: they will
enter Harry in the tournament. They will interfere subtly in the
tournament, to ensure that Harry is first to lay hands on the goblet.
They will enchant the goblet so that it is a "portkey", and whoever
first touches it will be transported into their evil clutches.

They need an evil-doer on the spot, to interfere in the
competition in Harry's favor; if he is eliminated early, or fails to
touch the goblet first, all their plotting will be for naught. So they
abduct and imprison Mad-eye Moody, a temporary faculty member and a
famous capturer of evil-doers, and enchant one of their own to
impersonate him for the entire school year.

The badness of this plan is just mind-boggling. Moody is a tough
customer. If they fail to abduct him, or if he escapes his year-long
captivity, their plans are in the toilet. If the substitution is
detected, their plans are in the toilet. Their fake Moody will be
teaching a class in "Defense Against the Dark Arts", a subject in
which the real Moody has real expertise that the substitute lacks; the
substitute somehow escapes detection on this front. For several
months the fake Moody will be eating three meals a day with a passel
of witches and wizards who are old friends with the real Moody, and
among whom is Albus Dumbledore, who supposedly is not a complete
idiot; the substitute somehow escapes detection on this front as
well.

Even with the substitution accomplished, the bad guys' task is far
from easy. Harry procrastinates everything he can and it's all they
can do to arrange that he is not eliminated from the tournament. None
of the other champions are either, and the villains have a tough
problem to make sure that he is
first through the maze.

Here is an alternative plan, which apparently did not occur to the
fearsome Lord Voldemort: instead of making the Goblet of Fire into a
portkey, he should enchant a common object, say a pencil. We know
this is possible, since it has been explicitly established that
absolutely any object can be a portkey, and the first instance of one
that we see appears to be an abandoned boot. Then, since fake Moody
is teaching Harry's class, sometime during the first week of the term
he should ask Harry to stay behind on some pretext, and then say "Oh,
Harry, would you please pass me that pencil over there?" After Harry
is dead, fake Moody can disappear. A little thought will no doubt
reveal similar plans that involve no substitutions or imprisonments:
send Harry a booby-trapped package in the mail, or enchant his socks,
or something of the sort.

In fact, they do something like this in one of the later books; they
sell another character, I think Ginny Weasley, some charm that puts
her under their control. This is a flub already, because they should
have sold it to Harry instead—duh—and then had him kill
himself. Or they could have sold him a portkey. Or an exploding
candy. But I don't want to belabor the point.

Normally I have no trouble suspending my disbelief in matters like
this. I can forgive a little ineptness on the part of the master
schemers, because I am such an inept schemer that I usually don't
notice. When evil plots seem over-elaborate and excessively risky to
me, I just imagine that it seems that way because evil plots are so
far outside my area of expertise, and read on. But in The
Goblet of Fire I couldn't do this. My enjoyment of the book
was disrupted by the extreme ineptness of the evil scheme.

One of Rowling's recurring themes is the corruption and ineptness of
the ostensibly benevolent government. But perhaps this incompetence
is a good thing. If the good guys had been less incompetent in the
past, the bad guys might have had to rise to the occasion, and would
have stomped Harry flat in no time. Lulled into complacency by years
of ineffective opposition, they become so weak and soft that they are
defeated by a gang of teenagers.

A couple of people pointed out that, contrary to what I asserted, the
algorithm I described can in fact overflow even when the final result
is small enough to fit in a machine word. Consider
for example. The algorithm, as I wrote
it, calculates intermediate values 8, 8, 56, 28, 168, 56, 280, 70, and
70 is the final answer. If your computer has 7-bit machine integers,
the answer (70) will fit, but the calculation will overflow along the
way at the 168 and 280 steps.

Perhaps more concretely,
!!35\choose11!! is 417,225,900,
which is small enough to fit in a 32-bit unsigned integer, but the
algorithm I wrote wants to calculate this as
!!35{34\choose10}\over11!!, and the
numerator here is 4,589,484,900, which does not fit.

One Reddit user suggested that you can get around this as follows: To
multiply r by a/b, first check if b
divides r. If so, calculate
(r/b)·a; otherwise calculate
(r·a)/b. This should avoid both overflow
and fractions.

Unfortunately, it does not. A simple example is !!{14\choose4} =
{11\over1}{12\over2}{13\over3}{14\over4}!!. After the
first three multiplications one has 286. One then wants to multiply
by 14/4. 4 does not divide 286, so the suggestion calls for
multiplying 286 by 14/4. But 14/4 is 3.5, a non-integer, and the goal
was to use integer arithmetic throughout.

Fortunately, this is not hard to fix. Say we want to multiply
r by a/b without overflow or fractions. First
let g be the greatest common divisor of r and b.
Then calculate ((r/g) ·
a)/(b/g). In the example above, g is 2,
and we calculate (286/2) · (14/2) = 143 · 7; this is the
best we can do.

I haven't looked, but it is hard to imagine that
Volume II of Knuth doesn't discuss this in exhaustive detail,
including all the stuff I just said, plus a bunch of considerations
that hadn't occurred to any of us.

A few people also pointed out that you can save time when n
> m/2 by calculating !!m\choose m-n!! instead of . For example, instead of calculating !!100\choose98!!, calculate . I didn't mention this in the original article
because it was irrelevant to the main point, and because I thought it
was obvious.

(Is it considered bad form among Haskellites to use the
n+k patterns? The Haskell Report is decidedly
ambivalent about them.)

Anyway, this is a quite terrible way to calculate binomial
coefficients. Consider calculating !!100\choose 2!!, for example. The result
is only 4950, but to get there the computer has to calculate 100! and
98! and then divide these two 150-digit numbers. This requires the
use of bignums in languages that have bignums, and causes an
arithmetic overflow in languages that don't. A straightforward
implementation in C, for example, drops dead with an arithmetic
exception; using doubles instead, it claims that the value of is -2147483648. This is all quite sad, since the
correct answer is small enough to fit in a two-byte integer.

Even in the best case,
!!2n\choose
n!!, the result is only on the order of 4n,
but the algorithm has to divide a numerator of about
4nn2n by a denominator of
about n2n to get it.

A much better way to calculate values of
is to use the following recurrence:

This is speedy, and it cannot cause an arithmetic overflow unless the
final result is too large to be represented.

It's important to multiply by the numerator before dividing by the
denominator, since if you do this, all the partial results are
integers and you don't have to deal with fractions or floating-point
numbers or anything like that. I think I may have mentioned before
how much I despise floating-point numbers. They are best avoided.

I ran across this algorithm last year while I was reading the Lilavati,
a treatise on arithmetic written about 850 years ago in India. The
algorithm also appears in the article on "Algebra" from the first
edition of the Encyclopaedia Britannica, published in
1768.

So this algorithm is simple, ancient, efficient, and convenient. And
the problems with the other algorithm are obvious, or should be. Why
isn't this better known?

One of the books in the bedtime-reading rotation for my daughter Iris
is A Bargain for Frances, by Russell and Lillian Hoban.
(Russell Hoban is also the author of a number of acclaimed novels for
adults, most notably Riddley Walker.) The plot and
character relationships in A Bargain for Frances are
quite complex, probably about at the limit of what a two-year-old can
handle. I will try to summarize.

Frances the badger is having a tea party with her friend Thelma, who
has previously behaved abusively to her. Thelma's tea set is plastic,
with red flowers. Frances is saving up her money for a real china tea
set with blue pictures. Thelma asserts that those tea sets are no
longer made, and that they are prohibitively expensive. She offers to
sell Frances her own tea set, in return for Frances's savings of
$2.17. Frances agrees. End of act I.

When Frances returns home with the plastic tea set, her little sister
Gloria criticizes it, saying repeatedly that it is "ugly". She
reports that the china kind with blue pictures is available in the
local candy store for $2.07, and that Thelma knows this. Frances
rushes to the candy store, where she witnesses Thelma buying a china
tea set with her money. End of act II.

There is an act III, but I do not want to spoil the ending.

There is
quite a lot here to engage the mind of a two-year-old: what does it
mean to make a trade, for example? And Thelma is quite devious in the
way she talks up the benefits of her plastic tea set ("It does not
break, unless you step on it") while dissembling her own desire for a
china one. Iris has not yet learned to deceive others for her own
benefit, and I think this is her first literary exposure to the
idea.

I mentioned at one point that Thelma had told a lie: she had said "I
don't think they make that kind [of tea set] anymore" when she knew
that the very tea set was available at the candy store. Iris was very
interested by this observation. She asked me repeatedly, over a
period of a several weeks, to explain to her what a lie was. I had some
trouble, because I did not have any good examples to draw on. Iris
does not do it yet, and Lorrie and I do not lie to Iris either.

One time I tried to explain lies by telling Iris about how people
sometimes tell children that if they do not behave, goblins will come
and take them away. Of course, this didn't work. First I had to
explain what goblins were. Iris was very disturbed at the thought of
goblins that might take her away. I had to reassure Iris that there
were no goblins. We got completely sidetracked on a discussion of
goblins. I should have foreseen this, but it was the best example I
was able to come up with on the spur of the moment.

Later I thought of a better example, with no distracting goblins:
suppose Iris asks for raspberries, and I know there are some in the
refrigerator, but I tell her that we have none, because I want
to eat them myself. I think this was just a little bit too
complicated for Iris. It has four parts, and I try to keep
explanations to three parts, which seems to be about the maximum that
she can follow at once. (Two parts is even better.) I think Iris attached
too much significance to the raspberries; for a while she seemed to
think that lying had something to do with raspberries.

Oh well, at least I tried. She will catch on soon enough, I am
sure.

Perhaps the most complex idea in the book is this: when Frances and
Thelma agree to trade money for tea set, they agree on "no backsies".
This is an important plot point. After the second or third reading,
Iris asked me what "no backsies" meant.

I had to think about this carefully before I answered, because it is
quite involved, and until I thought it through, I was not sure I
understood it myself. You might want to think about this before
reading on. Remember that it's not enough to understand it; you have
to be able to explain it.

My understanding of "no backsies" was that normally, when friends
trade, there is an assumption that the exchange may be unilaterally
voided by either party, as long as this is done timely. You can come
back the next day and say you have changed your mind, and your friend,
being your friend, is expected to consent. Specifying "no backsies"
establishes an advance agreement that this is not the case.
If you come back the next day, your friend can protest "but we said
there were no backsies on this" and refuse to undo the trade.
(The trade can, of course, be voided later if both parties
agree.)

So to understand this, you must first understand what it means to
trade, and why. Iris took this in early on, and fairly easily. You
also have to understand the idea that one or both parties might want
to change their minds later; this is also something Iris can get her
head around. Toddlers know all about what it means to change one's
mind.

But then you have to understand that one party might want to annul the
agreement and the other party might not. Tracking two people's
independent and conflicting desires is probably a little too hard for
Iris at this stage. She can sometimes understand another person's
point of view, by identification. ("You sometimes feel like x;
here this other person feels the same way.") And similarly she can
immerse herself in the world-view of the protagonist of a book, and
understand that the protagonist's desires might be frustrated by
another character. But to immerse herself in both world-views
simultaneously is beyond her.

"No backsies" goes beyond this: you have to understand the idea that
an agreement might have default, unspoken conventions, and that the
participants will adhere to these conventions even if they don't want
to; this is not something that two-year-olds are good at doing yet.
You have to understand the idea of an explicit modification to the
default conditions; that part is not too hard, and everyday examples
abound. But then you have to understand what the unspoken convention
actually is, and how it is being modified, and the difference between
a unilateral annulment of an agreement and a bilateral one. Again, I
think it's the bilaterality that's hard for Iris to understand. She
is still genuinely puzzled when I tell her we should leave the public
restroom clean for the next person.

Really, though, the main difficulty is just that the idea is very
complicated. Maybe I'm wrong about which parts are harder and which
parts are easier, and perhaps Iris can understand any of the pieces
separately. But at two years old she can't yet sustain a train of
thought as complicated as the one required to put all the pieces of
"no backsies" together. This sort of understanding is one of the
essential components of being an adult, and she will get it sooner or
later; probably sooner.

This is not the only part of the book that repays careful thought. At
one point, during Thelma's monologue about the unavailability of china
tea sets, she says:

I know another girl who saved up for that tea set.
Her mother went to every store and could not find one. Then that girl lost
some of her money and spent the rest on candy. She never got the tea
set. A lot of girls never do get tea sets. So maybe you won't get one.

One evening my wife Lorrie asked me who I thought Thelma was speaking
about in that passage. I replied that I had always understood it as a
pure fabrication, and that there was no "other girl".

Lorrie said that she thought that Thelma had been speaking
about herself, that Thelma had saved up her money, and
her mother had gone looking for a china tea set, been unable to
find one, and had
brought home the plastic set as a consolation prize.

The crucial clue was the detail about how the "other girl" spent the
rest of her money on candy, which is just a bit too specific for a
mere fabrication.

Once you try out the hypothesis that Thelma
is speaking personally, a lot of other details fall into place. For
example, her assertion that "A lot of girls never do get tea sets" is
no longer a clever invention on her part: she is repeating something
her mother told her to shut her up when she expressed her
disappointment over receiving a plastic instead of a china tea set.
Her sales pitch to Frances about why a plastic tea set is better than
a china one can be understood as an echo of her mother's own attempts
to console her.

My wife is very clever, and was an English major to boot. She is
skilled at noticing such things both by native talent and by long
training of that talent.

Good children's literature does reward a close reading, and like good
adult literature, reveals additional depths on multiple readings.
It seems to me that books for
small children are more insipid than they used to be, but that could
just be fuddy-duddyism, or it could be selection bias: I no longer
remember the ones I loved as a child that would now seem insipid
precisely because they would now seem insipid.

But the ability to produce good literature at any level is rare, so it
is probably just that there only a few great writers in every
generation can do it. Russell Hoban was one of the best here.

Counting transitive relations
A relation on a set S is merely a subset of
S×S. For example, the relation < on the set
{1,2,3} can be identified as {(1,2), (1,3), (2,3)}, the set of all (a,
b) with a < b.

A relation is transitive if, whenever it has both (a, b)
and (b, c), it also has (a, c).

For the last week I've been trying to find a good way to calculate the
number of transitive relations on a set with three elements.

There are 13 transitive relations on a set with 2 elements. This is
easy to see. There are 16 relations in all. The only way a relation
can fail to be transitive is to contain both (1, 2) and (2, 1). There
are clearly four such relations. Of these four, the only one that is
transitive has (1, 1) and (2, 2) also. Similarly it's quite easy to
see that there are only 2 relations on a 1-element set, and both are
transitive.

There are 512 relations on a set with 3 elements. How many are
transitive?

It would be very easy to write a computer program to check them all
and count the transitive ones. That is not what I am after here. In
fact, it would also be easy to enumerate the transitive relations by
hand; 512 is not too many. That is not what I am after either. I am
trying to find some method or technique that scales reasonably well,
well enough that I could apply it for larger n.

No luck so far. Relations on 3-sets can fail to be transitive in all
sorts of interesting ways. Say that a relation has the
Fabc property if it contains
(a,b) and (b,c) but not
(a,c). Such a relation is intransitive.

Now clearly there are 64 Fabc relations for
each distinct choice of a, b, and c. But some of
these properties overlap. For example, {(a,b),
(b,c), (c,a)} has not only the
Fabc property but also the
Fbca and
Fcab properties.

Of the 64 relations with the Fabc property,
16 have the Fbca property also.
16 have the Faba property.
None have the Facb property. There are 12 of
these properties, and they overlap
in a really complicated way.

After a week I gave in and looked in the literature. I have a couple
of papers in my bag I haven't read yet. But it seems that there is no
simple solution, which is reassuring.

One problem is that the number of relations on n elements grows
very rapidly (it's 2n2) and the number of
transitive relations is a good-sized fraction of these.

Moziz Addums
Last July at a porch sale I obtained a facsimile copy of
Housekeeping in Old Virginia, by M.C. Tyree, originally
published in 1879. I had been trying to understand the purpose of
ironing. Ironing makes the clothes look nice, but it must have also
served some important purpose, essential for life, that I don't now
understand. In the Laura Ingalls Wilder Little House
books, Laura recounts a common saying that scheduled the week's
work:

Wash on Monday
Iron on Tuesday
Mend on Wednesday
Churn on Thursday
Clean on Friday
Bake on Saturday
Rest on Sunday

You bake on Saturday so that you have fresh bread for Sunday dinner.
You wash on Monday because washing is backbreaking labor and you want
to do it right after your day of rest. You iron the following day
before the washed clothes are dirty again. But why iron at all? If
you don't wash the clothes or clean the house, you'll get sick and
die. If you don't bake, you won't have any bread, and you'll starve.
But ironing? In my mind it was categorized with dusting, as something
people with nice houses in the city might do, but not something that
Ma Ingalls, three miles from the nearest neighbor, would concern
herself with.

But no. Ironing, and starching with the water from boiled potatoes,
was so important that it got a whole day to itself, putting it on par
with essential activities like cleaning and baking. But why?

A few months later, I figured it out. In this era of tumble-drying
and permanent press, I had forgotten what happens to fabrics that are
air dried, and did not understand until I was on a trip and tried to
air-dry a cotton bath towel. Air-dried fabrics come out not merely
wrinkled but corrugated, like an accordion, or a washboard, and are
unusable. Ironing was truly a necessity.

Anyway, I was at this porch sale, and I hoped that this 1879
housekeeping book might provide the answer to the ironing riddle. It
turned out to be a cookbook. There is plenty to say about this
cookbook anyway. It comes recommended by many notable ladies,
including Mrs. R.B. Hayes. (Her husband was President of the United
States.) She is quoted on the flyleaf as being "very much pleased"
with the cookbook.

Some of the recipes are profoundly unhelpful. For example, p.106
has:

Boiled salmon. After the fish has been cleaned and washed, dry
it and sew it up in a cloth; lay in a fish-kettle, cover with warm
water, and simmer until done and tender.

Just how long do I simmer it? Oh, until it is "done" and "tender".
All right, I will just open up the fish kettle and poke it to
see. . . except that it is sewed up in a cloth. Hmmm.

You'd think that if I'm supposed to simmer this fish that has been
sewn up in a cloth, the author of the recipe might advise me on how
long until it is "done". "Until tender" is a bit of a puzzle too. In
my experience, fish become firmer and less tender the longer you
simmer them. Well, I have a theory about this. The recipe is
attributed to "Mrs. S.T.", and consulting the index of contributors, I
see that it is short for "Mrs. Samuel Tyree", presumably the editor's
mother-in-law. Having a little joke at her expense, perhaps?

There are a lot of other interesting points, which may appear here
later. For example, did you know that the most convenient size hog
for household use is one of 150 to 200 pounds? And the cookbook
contains recipes not only for tomato catsup, but also pepper catsup,
mushroom catsup, and walnut catsup.

But the real reason I brought all this up is that page 253–254 has
the following item, attributed to "Moziz Addums":

Resipee for cukin kon-feel Pees.
Gether your pees 'bout sun-down. The folrin day, 'bout leven o'clock,
gowge out your pees with your thum nale, like gowgin out a man's
ey-ball at a kote house. Rense your pees, parbile them, then fry 'erm
with some several slices uv streekd middlin, incouragin uv the gravy
to seep out and intermarry with your pees. When modritly brown, but
not scorcht, empty intoo a dish. Mash 'em gently with a spune, mix
with raw tomarters sprinkled with a little brown shugar and the
immortal dish ar quite ready. Eat a hepe. Eat mo and mo. It is good
for your genral helth uv mind and body. It fattens you up, makes you
sassy, goes throo and throo your very soul. But why don't you eat?
Eat on. By Jings. Eat. Stop! Never, while thar is a pee in
the dish.

This was apparently inserted for humorous effect. Around the time the
cookbook was written, there was quite a vogue for dialectal humor of
this type, most of which has been justly forgotten. Probably the
best-remembered practitioner of this brand of humor was Josh Billings,
who I bet you haven't heard of anyway. Tremendously popular at the
time, almost as much so as Mark Twain, his work is little-read today;
the joke is no longer funny. The exceptionally racist example above
is in many ways typical of the genre.

One aspect of this that is puzzling to us today (other than the
obvious "why was this considered funny?") is that it's not clear
exactly what was supposed to be going on. Is the idea that Moziz
Addums wrote this down herself, or is this a transcript by a literate
person of a recipe dictated by Moziz Addums? Neither theory makes
sense. Where do the misspellings come from? In the former theory,
they are Moziz Addums' own misspellings. But then we must imagine
someone literate enough to spell "intermarry" and "immortal"
correctly, but who does not know how to spell "of".

In the other theory, the recipe is a transcript, and the misspellings
have been used by the anonymous, literate transcriber to indicate
Moziz Addums' unusual or dialectal pronunciations, as with
"tomarters", perhaps. But "uv" is the standard (indeed, the only)
pronunciation of "of", which wrecks this interpretation. (Spelling
"of" as "uv" was the signature of Petroleum V. Nasby, another one of
those forgotten dialectal humorists.) And why did the transcriber
misspell "peas" as "pees"?

So what we have here is something that nobody could possibly have
written or said, except as an inept parody of someone else's speech.
I like my parody to be rather less artificial.

All of this analysis would be spoilsportish if the joke were actually
funny. E.B. White famously said that "Analyzing humor is like
dissecting a frog. Few people are interested and the frog dies of it."
Here, at least, the frog had already been dead for a hundred years
dead before I got to it.

[ Addendum 20100810: In case you were wondering, "kon-feel pees" are
actually "cornfield peas", that is, peas that have been planted in
between the rows of corn in a cornfield. ]

Ambiguous words and dictionary hacks
A Mexican gentleman of my acquaintance, Marco Antonio Manzo, was
complaining to me (on IRC) that what makes English hard was the large
number of ambiguous words. For example, English has the word "free"
where Spanish distinguishes "gratis" (free like free beer) from
"libre" (free like free speech).

I said I was surprised that he thought that was unique to English, and
said that probably Spanish had just as many "ambiguous" words, but
that he just hadn't noticed them. I couldn't think of any Spanish
examples offhand, but I knew some German ones: in English, "suit" can
mean a lawsuit, a suit of clothes, or a suit of playing cards. German
has different words for all of these. In German, the suit of a
playing card is its "farbe", its color. So German distinguishes
between suit of clothes and suit of playing cards, which English does
not, but fails to distinguish between colors of paint and suit of
playing cards, which English does.

Every language has these mismatches. Korean has two words for "thin",
one meaning thin like paper and the other meaning thin like string.
Korean distinguishes father's sister ("komo") from mother's sister
("imo") where English has only "aunt".

Anyway, Sr. Manzo then went to lunch, and I wanted to find some
examples of concepts distinguished by English but not by Spanish.
I did this with a dictionary hack.

A dictionary hack is when you take a plain text dictionary and do
some sort of rough-and-ready processing on it to get an 80% solution
to some problem.
The oldest dictionary hack I know of is the old Unix rhyming
dictionary hack:

rev /usr/dict/words | sort | rev > rhyming.txt

This takes the Unix word list and turns it into a semblance of a
rhyming dictionary. It's not an especially accurate semblance, but
you can't beat the price.

It figures out that "clash" rhymes with "lash" and "backlash", but not
that "myrrh" rhymes with "purr" or "her" or "sir". You can of
course, do better, by using a text file that has two columns, one for
orthography and one for pronunciation, and sorting it by reverse
pronunciation. But like I said, you won't beat the price.

But I digress. Last week I pulled an excellent dictionary hack. I
found the Internet Dictionary Project's English-Spanish lexicon file on
the web with a quick Google search; it looks like this:

The sort sorts the lexicon into Spanish order instead of English
order. The Perl thing comes out looking a lot more complicated than
it ought. It just says to look and print consecutive items that have
the same Spanish, but whose English begins with different letters.
The condition on the English is to filter out items where the Spanish
is the same and the English is almost the same, such as:

blond

blonde

rubio

cake

cakes

tarta

oceanographic

oceanographical

oceanografico[Adjective]

palaces

palazzi

palacios[Noun]

talc

talcum

talco

taxi

taxicab

taxi

It does filter out possible items of interest, such as:

carefree

careless

sin cuidado

But since the goal is just to produce some examples, and this cheap
hack was never going to generate an exhaustive list anyway, that is
all right.

A lot of these are useless, genuine synonyms. It would be silly to
suggest that Spanish fails to preserve the English distinction between
"marry" and "wed", between "ale" and "beer", between "desire" and
"yearn", or between "vest" and "waistcoat". But some good
possibilities remain.

Of these, some probably fail for reasons that only a Spanish-speaker
would be able to supply. For instance, is "el pastel" really the best
translation of both "cake" and "pie"? If so, it is an example of the
type I want. But perhaps it's just a poor translation; perhaps
Spanish does have this distinction; say maybe "torta" for "cake" and
"empanada" for "pie". (That's what Google suggests, anyway.)

Another kind of failure arises because of idioms. The output:

exactly o'clock en punto

is of this type. It's not that Spanish fails to distinguish between
the concepts of "exactly" and "o'clock"; it's that "en punto" (which
means "on the point of") is used idiomatically to mean both of those
things: some phrase like "en punto tres" ("on the point of three")
means "exactly three" and so, by analogy, "three o'clock". I don't
know just what the correct Spanish phrases are, but I can guess that
they'll be something like this.

Still, some of the outputs are suggestive:

high

tall

alto

low

small

bajo[Adjective]

babble

fumble

balbucear[Verb]

jealous

zealous

celoso

contest

debate

debate[Noun]

forlorn

stranded

desamparado[Adjective]

docile

meek

do/cil[Adjective]

picture

square

el cuadro

fourth

room

el cuarto

collar

neck

el cuello

idiom

language

el idioma[Noun]

clock

watch

el reloj

floor

ground

el suelo

ceiling

roof

el techo

knife

razor

la navaja

feather

pen

la pluma

cloudy

foggy

nublado

I put some of these to Sr. Manzo, and he agreed that some were indeed
ambiguous in Spanish. I wouldn't have known what to suggest without
the dictionary hack.

Bryan and his posse
Today upon the arrival of a coworker and his associates, I said "Oh,
here comes Bryan and his posse". My use of "posse" here drew some
comment. I realized I was not completely sure what "posse" meant. I
mostly knew it from old West contexts: the Big Dictionary has quotes
like this one, from 1901:

A pitched battle was fought..at Rockhill, Missouri, between the
Sheriff's posse and the miners on strike.

I first ran across the word in J.D. Fitzgerald's Great
Brain books. At least in old West contexts, the word refers to a
gang of men assembled by some authority such as a sheriff or a marshal,
to perform some task, such as searching for a lost person, apprehending
an outlaw, or blasting some striking miners. This much was clear
to me before.

From the context and orthography, I guessed that it was from Spanish.
But no, it's not. It's Latin! "Posse" is the Latin verb "to be
able", akin to English "possible" and ultimately to "potent" and
related words. I'd guessed something like this, supposing English
"posse" was akin to some Spanish derivative of the Latin. But it
isn't; it's direct from Latin: "posse" in English is short for
posse comitatus, "force of the county".

The Big Dictionary has citations for "posse comitatus" back to 1576:

Mr. Sheryve meaneth in person to repayre thither & with force to bryng
hym from Aylesham, Whomsoever he fyndeth to denye the samet & suerly
will with Posse Comitatus fetch hym from this new erected pryson to
morrow.

"Sheryve" is "Sheriff". (If you have trouble understanding this, try
reading it aloud. English spelling changed more than its pronunciation
since 1576.)

I had heard the phrase before in connection with the Posse
Comitatus Act of U.S. law. This law, passed in 1878, is intended to
prohibit the use of the U.S. armed forces as Posse
Comitatus—that is, as civilian law enforcement. Here the use is
obviously Latin, and I hadn't connected it before with the sheriff's
posse. But they are one and the same.

Woodrow Wilson on bloggers
Last weekend my family and I drove up to New York. On the way we
stopped in the Woodrow Wilson Service Area on the New Jersey
Turnpike, which has a little plaque on the wall
commemorating Woodrow Wilson and providing some quotations, such as:

Your age as a fraction, again
In a recent article, I
discussed methods for calculating your age as a fractional year, in
the style of (a sophisticated) three-and-a-half-year-old. For
example, as of today, Richard M. Stallman is (a sophisticated)
54-and-four-thirty-thirds-year-old; tomorrow he'll be a
54-and-one-eighth-year-old.

I discussed several methods of finding the answer, including a clever
but difficult method that involved fiddling with continued fractions,
and some dead-simple brute force methods that take nominally longer
but are much easier to do.

But a few days ago on IRC, a gentleman named Mauro Persano said he
thought I could use the Stern-Brocot tree to solve the problem, and he
was absolutely right. Application of a bit of clever theory sweeps
away all the difficulties of the continued-fraction approach, leaving
behind a solution that is clever and simple and fast.

Here's the essence of it: We consider a list of intervals that covers
all the positive rational numbers; initially, the list contains only
the interval (0/1, 1/0). At each stage we divide each interval in the
list in two,
by chopping it at the simplest fraction it contains.

To chop the interval (a/b, c/d), we split
it into the two intervals (a/b,
(a+c)/(b+d)),
((a+c)/(b+d)), c/d). The
fraction (a+c)/(b+d) is called the
mediant of a/b and c/d. It's not
obvious that the mediant is always the simplest possible fraction in
the interval, but
it is true.

So we start with the interval (0/1, 1/0), and in the first step we
split it at (0+1)/(1+0) = 1/1. It is now two intervals, (0/1, 1/1) and
(1/1, 1/0). At the next step, we split these two intervals at 1/2 and
2/1, respectively; the resulting four intervals are
(0/1, 1/2),
(1/2, 1/1),
(1/1, 2/1), and
(2/1, 1/0). We split these at 1/3, 2/3, 3/2, and 3/1. The process
goes on from there:

0/1

1/0

0/1

1/1

1/0

0/1

1/2

1/1

2/1

1/0

0/1

1/3

1/2

2/3

1/1

3/2

2/1

3/1

1/0

0/1

1/4

1/3

2/5

1/2

3/5

2/3

3/4

1/1

4/3

3/2

5/3

2/1

5/2

3/1

4/1

1/0

Or, omitting the repeated items at each step:

0/1

1/0

1/1

1/2

2/1

1/3

2/3

3/2

3/1

1/4

2/5

3/5

3/4

4/3

5/3

5/2

4/1

If we disregard the two corners, 0/1 and 1/0, we can see from this
diagram that the fractions naturally organize themselves into a tree.
If a fraction is introduced at step N, then the interval it
splits has exactly one endpoint that was introduced at step
N-1, and this is its parent in the tree; conversely, a fraction
introduced at step N is the parent of the two step-N+1
fractions that are introduced to split the two intervals of which it
is an endpoint.

This process has many important and interesting properties. The
splitting process eventually lists every positive rational number
exactly once, as a fraction in lowest terms. Every fraction is
simpler than all of its descendants in the tree. And, perhaps most
important, each time an interval is split, it is divided at the
simplest fraction that the interval contains. ("Simplest" just
means "has the smallest denominator".)

This means that we can find the simplest fraction in some interval
simply by doing binary tree search until we find a fraction in that
interval.

For example, Placido Polanco had a .368 batting average
last season. What is the smallest number of at-bats he could have
had? We are asking here for the denominator of the
simplest fraction that lies in the interval [.3675, .3685).

We start at the root, which is 1/1. 1 is too big,
to we move left down the tree to 1/2.

1/2 = .5000 and is also too big, so we move left down the tree to
1/3.

1/3 = .3333 and is too small, so we move right down the tree to 2/5.

2/5 = .4000 and is too big, so go left to 3/8, which is the
mediant of 1/3 and 2/5.

3/8 = .3750, so go left to 4/11, the mediant of 1/3 and 3/8.

4/11 = .3636, so go right to 7/19, the mediant of 3/8 and 4/11.

7/19 = .3684, which is in the interval, so we are done.

If we knew nothing else about Polanco's batting record, we could still
conclude that he must have had at least 19 at-bats. (In fact, he had
35 hits in 95 at-bats.)

Calculation of mediants is incredibly simple, even easier than adding
fractions. Tree search is simple, just compare and then go left or
right. Calculating whether a fraction is in an interval is simple
too. Everything is simple simple simple.

Our program wants to find the simplest fraction in some interval, say
(L, R). To do this, it keeps track of
l and r, initially 0/1 and 1/0, and repeatedly
calculates the mediant m of l and r. If the mediant is in
the target interval, the function is done. If the mediant
is too small, set l = m and continue; if it is too large
set r = m and continue:

This says that the simplest fraction in the range [0/10, 1/10) is
1/11; the simplest fraction in the range [3/10, 4/10) is 1/3, and so
forth. The simplest fractions that do not appear are 1/5, which is
beaten out by the simpler 1/4 in the [2/10, 3/10) range, and 3/5,
which is beaten out by 2/3 in the [6/10, 7/10) range.

Unlike the programs from the previous article, this program is really
fast, even in principle, even for very large arguments. The code is
brief and simple. But we had to deploy some rather sophisticated
number theory to get it. It's a nice reminder that the sawed-off
shotgun doesn't always win.

Excessive precision
You sometimes read news articles that say that some object is 98.42
feet tall, and it is clear what happened was that the object was
originally reported to be 30 meters tall, and some knucklehead
translated 30 meters to 98.42 feet, instead of to 100 feet as they
should have.

Finding a real example for you was easy: I just did Google search for
"62.14 miles", and got this little jewel:

Tsunami waves can be up to 62.14 miles long! They can also be about
three feet high in the middle of the ocean. Because of its strong
underwater energetic force, the tsunami can rise up to 90 feet, in
extreme cases, when they hit the shore! Tsunami waves act like
shallow water waves because they are so long. Because it is so long,
it can last an hour. In the Pacific Ocean, a tsunami moves 60.96 feet
a second, passing through water that is around 1219.2 feet deep.

The 60.96 feet per second is actually 100 km/hr, but I'm not sure
what's going on with the 1219.2 feet deep. Is it 1/5 nautical mile?
But that would be strange. [ Addendum 20070428: the explanation.]

Here's another delightful example:

The MiniC.A.T. is very cost-efficient to operate. According to MDI,
it costs less than one dollar per 62.14 miles... Given the absence of
combustion and the fact that the MiniC.A.T. runs on vegetable oil, oil
changes are only necessary every 31,068 miles.

(I should add that many of the hits for "62.14 miles" were perfectly
legitimate. Many concerned 100-km bicycle races, or the conditions
for winning the X-prize. In both cases the distance is in fact 62.14
miles, not 62.13 or 62.15, and the precision is warranted. But I
digress.)

(Long ago there was a parody of the New York Times which
included a parody sports section that announced "FOOTBALL TO GO
METRIC". The article revealed that after the change, the end zones
would be placed 91.44 meters apart...)

Anyway, similar knuckleheadedness occurs in the well-known value of
98.6 degrees Fahrenheit for normal human body temperature. Human body
temperature varies from individual to individual, and by a couple of
degrees over the course of the day, so citing the "normal" temperature
to a tenth of a degree is ridiculous. The same thing happened here as
with the 62.14-mile tsunami. Normal human body temperature was
determined to be around 37 degrees Celsius, and then some knucklehead
translated 37°C to 98.6°F instead of to 98°F.

When our daughter was on the way, Lorrie and I took a bunch of classes
on baby care. Several of these emphasized that the maximum safe
spacing for the bars of a crib, rails of a banister, etc., was two
and three-eighths inches. I was skeptical, and at one of these classes
I was foolish enough to ask if that precision were really required:
was two and one-half inches significantly less safe? How about two
and seven-sixteenths inches? The answer was immediate and
unequivocal: two and one-half inches was too far apart for safety; two
and three-eighths inches is the maximum safe distance.

But two and three-eighths inches is 6.0325 cm, so draw your own
conclusion about what happened here.

[ Addendum 20070430: 60.96 feet per second is nothing like 100 km/hr,
and I have no idea why I said it was. The 60.96 feet per second
appears to be a backwards conversion of 200 m/s to ft/s, multiplying
by 3.048 instead of dividing. As Scott turner noted a few
days ago, a similar error occurs in the conversion of meters to
feet in the "1219.2 feet deep" clause. ]

Degrees of algebraic numbers
An algebraic number x is said to have degree n if it is
the zero of some irreducible nth-degree polynomial P with
integer coefficients.

For example, all rational numbers have degree 1, since the rational
number a/b is a zero of the first-degree polynomial
bx - a. √2 has degree 2, since it is a zero of
x2 - 2, but (as the Greeks showed) not of
any first-degree polynomial.

It's often pretty easy to guess what degree some number has, just by
looking at it. For example, the nth root of a prime number p
has degree n.
!!\sqrt{1 + \sqrt 2}!!
has a square root of a square root, so it's fourth-degree number.
If you write
!!x = \sqrt{1 + \sqrt 2}!!
then eliminate the square roots, you get
x4 - 2x2 - 1, which is the
4th-degree polynomial satisfied by this 4th-degree number.

But it's not always quite so simple. One day when I was in high
school, I bumped into the fact that !!\sqrt{7 + 4 \sqrt 3}!!, which looks
just like a 4th-degree number, is actually a 2nd-degree number.
It's numerically equal to !!2 + \sqrt 3!!. At the time,
I was totally boggled. I couldn't believe it at first, and I had to
get out my calculator and calculate both values numerically to be sure
I wasn't hallucinating. I was so sure that the nested square
roots in would force it to be
4th-degree.

If you eliminate the square roots, as in the other example, you get
the 4th-degree polynomial
x4 - 14x2 + 1, which is satisfied
by . But unlike the previous
4th-degree polynomial, this one is reducible. It factors into
(x2 + 4x + 1)(x2 - 4x + 1). Since
is a zero of the polynomial, it
must be a zero of one of the two factors, and so it is
second-degree. (It is a zero of the second factor.)

I don't know exactly why I was so stunned to discover this. Clearly,
the square of any number of the form a +
b√c is another number of the same form (namely
(a2 + b2c) +
2ab√c), so it must be the case that lots of
a + b√c numbers must be squares of other
such, and so that lots of !!\sqrt{a + b \sqrt c}!!
numbers must be second-degree. I must have known this, or at least
been capable of knowing it. Socrates says that the truth is within us,
and we just don't know it yet; in this case that was certainly true.
I think I was so attached to the idea that the nested square roots
signified fourth-degreeness that I couldn't stop to realize that they
don't always.

In the years since, I came to realize that recognizing the degree of
an algebraic number could be quite difficult. One method, of course, is
the one I used above: eliminate the radical signs, and you have a
polynomial; then factor the polynomial and find the irreducible factor
of which the original number is a root. But in practice this can be
very tricky, even before you get to the "factor the polynomial" stage.
For example, let x = 21/2 + 21/3. Now let's try to eliminate
the radicals.

Proceeding as before, we do
x - 21/3 = 21/2 and then square both sides, getting
x2 - 2·21/3x + 22/3 = 2, and then it's
not clear what to do next.

So we try the other way, starting with
x - 21/2 = 21/3 and then cube both sides, getting
x3 - 3·21/2x2 + 6x -
2·21/2 = 2. Then we move all the 21/2 terms to the other
side: x3 + 6x - 2 =
(3x2 + 2)·21/2. Now squaring both sides
eliminates the last radical, giving us
x6 +
12x4 -
4x3 +
36x2 -
24x + 4 =
18x4 +
12x2 + 8. Collecting the terms, we see that
21/2 + 21/3 is a root of
x6 -
6x4 -
4x3 +
12x2 -
24x - 4. Now we need to make sure that this polynomial is
irreducible. Ouch.

In the course of writing this article, though, I found a much better
method. I'll work a simpler example first, √2 + √3. The
radical-eliminating method would have us put x - √2 = √3, then
x2 - 2√2x + 2 = 3, then
x2 - 1 = 2√2x, then
x4 - 2x2 + 1 = 8x2, so
√2 + √3 is a root of
x4 - 10x2 + 1.

All of the powers of x have the form
a6√6 +
a2√2 +
a3√3 +
a1. This is easy to see if you write p for
√2 and q for √3. Then x = p + q and
powers of x are polynomials in p and q. But any
time you have p2 you replace it with 2, and any time
you have q2 you replace it with 3, so your
polynomials never have any terms in them other than 1, p,
q, and pq.

This means that you can think of the powers of x as being
vectors in a 4-dimensional vector space whose canonical basis is {1,
√2, √3, √6}. Any four vectors in this space, such as {1, x,
x2, x3}, are either linearly
independent, and so can be combined to total up to any other vector,
such as x4, or else they are linearly
dependent and three of them can be combined to make the
fourth. In the former case, we have found a fourth-degree polynomial
of which x is a root, and proved that there is no
simpler such polynomial; in the latter case, we've found a simpler
polynomial of which x is a root.

To complete the example above, it is evident that
{1, x,
x2, x3} are linearly independent,
but if you don't believe it you can use any of the usual mechanical
tests. This proves that √2 + √3 has degree 4, and not less. Because
if √2 + √3 were of degree 2 (say) then we would be able to find
a, b, c such that ax2 +
bx + c = 0, and then the x2, x1,
and x0 vectors would be dependent. But they aren't,
so we can't, so it isn't.

Instead, there must be a, b, c, and d
such that x4 =
ax3 +
bx2 +
cx +
d. To find these we need merely solve a system of four
simultaneous equations, one for each column in the table:

I think [21/2 + 21/3] turns out to be degree 6, but if you try to work
it out in the straightforward way, by equating it to x and then
trying to get rid of the roots, you get a big mess. I think it
turns out that if two numbers have degrees a and b, then
their sum has degree at most ab, but I wouldn't even want to
swear to that without thinking it over real carefully.

Happily, I'm now sure about all of this. I can work through the
mechanical method on it. Putting x = 21/2 + 21/3, we get:

Happy birthday Leonhard Euler
Leonhard Euler, one of the greatest and most prolific mathematicians
ever to walk the earth, was born 300 years ago today in Basel,
Switzerland.

Euler named the constant e (not for himself; he used vowels for
constants and had already used a for something else), and
discovered the astonishing formula !!e^{ix} =
\cos x + i \sin x!!, which is known as Euler's
formula. A special case of this formula is the Euler
identity: !!e^{i\pi} + 1 =
0!!.

I never really understood what was going on there until last year,
when I read the utterly brilliant book Visual Complex
Analysis, by Tristan Needham. This was certainly the best math
book I read in all of 2006, and probably the best one I've read in the
past five years. (Many thanks to Dan Schmidt for rcommending it.)

The brief explanantion is something like this: the exponential
function ect
is exactly the function that satisfies the differential
equation df/dt = cf(t). That is,
it is the function that describes the motion of a particle whose
velocity is proportional to its position at all times.

Imagine a particle moving on the real line. If its velocity is
proportional to its position, it will speed away from the origin at an
exponentially increasing rate. Or, if the proportionality constant is
negative, it will rapidly approach the origin, getting closer (but
never quite reaching it) at an exponentially increasing rate.

Now, suppose we consider a particle moving on the complex plane
instead of on the real line, again with velocity proportional to
position. If the proportionality constant is real, the particle will
speed away from the origin (or towards it, if the constant is
negative), as before. But what if the proportionality constant is
imaginary?

A proportionality constant of i means that the velocity of the
particle is at right angles to the position, because multiplication by
i in the complex plane corresponds to a counterclockwise
rotation by 90°, as always. In this case, the path of the
particle is a circle, and so its position as a function of t is
described by something like cos t + i sin t. But
this function must satisfy the differential equation also, with
c = i, and we have Euler's formula.

Another famous and important formula named after Euler is also called
Euler's formula, and states that for any simply-connected
polyhedron with F faces, E edges, and V vertices,
F - E + V = 2.
For example, the cube has 6 faces, 12 edges, and 8 vertices, and
indeed 6 - 12 + 8 = 2.
The formula also holds for all
planar graphs and is the fundamental result of planar graph theory.

Spheres in this case behave like planes, and graphs that cover spheres
also satisfy F - E + V = 2. One then wonders
whether the theorem holds for more complex surfaces, such as tori;
this is equivalent to asking about polyhedra that have a single hole.
In this case, the the theorem is a little different, and the identity becomes F - E + V =
0.

It turns out that every surface S has a value χ(S),
called the Euler characteristic, such that graphs on the
surface all satisfy F - E + V =
χ(S).

Euler also discovered that the sum of the first n terms of the
harmonic series,
1 + 1/2 + 1/3 + ... + 1/n, is approximately log n. We
might like to say that it becomes arbitrarily close to log n,
as so many things do, but it does not. It is always a bit larger than
log n, and you cannot make it as close as you want. The more
terms you take, the closer the sum gets to log n + γ,
where γ is approximately 0.577216. This γ is
Euler's constant:

This is one of those numbers that shows up all over the place, and is
easy to calculate, but is
a big fat mystery. Is it rational? Everyone would be shocked if it
were, but nobody knows for sure.

The Euler totient function φ(x) counts the number of
integers less than x that have no divisors in common with
x. It is of tremendous importance in combinmatorics and number
theory. One of the most fundamental and astonishing facts about the
totient function is Euler's theorem:
aφ(n) - 1
is a multiple of n whenever a and n have no divisors in
common. For example, since &phi(9) = 6, a6 - 1 is a
multiple of
9, except when a is divisible by 3:

16 - 1

=

0·

9.

26 - 1

=

7·

9.

46 - 1

=

455·

9.

56 - 1

=

1736·

9.

76 - 1

=

13072·

9.

Euler's solution in 1736 of the "bridges of Königsberg" problem is
often said to have begun the study of topology. It is also the source
of the term "Eulerian path".

Shell-less piping in Perl

Unfortunately, there is no easy way to avoid the shell when running a
command that is attached to the parent process via a pipe. Perl
provides open "| command arg arg arg...", which is what I
used, and which is analogous to [system STRING], involving
the shell. But it provides nothing analogous to [system
ARGLIST], which avoids the shell. If it did, then I probably
would have used it, writing something like this:

open M, "|", $MAILER, "-fnobody\@plover.com", $addre;

and the whole problem would have been avoided.

Several people wrote to point out that, as of Perl 5.8.0, Perl
does provide this, with a syntax almost identical to what I
proposed:

open M, "|-", $MAILER, "-fnobody\@plover.com", $addre;

Why didn't I use this? The program was written in late 2002, and Perl
5.8.0 was released in July 2002, so I expect it's just that I wasn't
familiar with the new feature yet. Why didn't I mention it in the
original article? As I said, I just got back from Asia, and I am
still terribly jetlagged.

(Jet lag when travelling with a toddler is worse than normal jet lag,
because nobody else can get over the jet lag until the toddler
does.)

Jeff Weisberg also pointed out that even prior to 5.8.0, you can
write:

open(F, "|-") || exec("program", "arg", "arg", "arg");

Why didn't I use this construction? I have run out of excuses.
Perhaps I was jetlagged in 2002 also.

RFC 822

John Berthels wrote to point out that my proposed fix, which rejects
all inputs containing spaces, also rejects some RFC822-valid
addresses. Someone whose address was actually something like
"Mark Dominus"@example.com would be unable to use the web form to
subscribe to the mailing list.

Quite so. Such addresses are extremely rare, and people who use them
are expected to figure out how to subscribe by email, rather than
using the web form.

qmail

Nobody has expressed confusion on this point, but I want to expliticly
state that, in my opinion, the security problem I described was
entirely my fault, and was not due to any deficiency in the
qmail mail system, or in its qmail-inject or
qmail-queue components.

Moreover, since I have previously been paid to give classes at large
conferences on how to avoid exactly this sort of problem, I
deserve whatever scorn and ridicule comes my way because of this.

I was away in Asia, and when I got back I noticed some oddities in my
mail logs. Specifically, Yahoo! was rejecting
mail.plover.com's outgoing email. In the course of
investigating the mail logs, I discovered the reason why:
mail.plover.com had been relaying a ton of outgoing spam.

It took me a little while to track down the problem. It was a mailing
list subscription form on perl.plover.com:

your address
perl-qotw
perl-qotw-discuss
perl-qotw-discuss-digest

The form took the input email address and used it to manufacture an email
message, requesting that that address be subscribed to the indicated
lists:

The message was delivered to the list management software, which
interpreted it as a request to subscribe, and generated an appropriate
confirmation reply. In theory, this doesn't open any new security
holes, because a malicious remote user could also forge an identical
message to the list management software without using the form.

The problem is the interpolated $addre variable. The value of
this variable is essentially the address from the form. Interpolating
user input into a string like this is always fraught with peril.
Daniel J. Bernstein has one of the most succinct
explanantions of this that I have ever seen:

The essence of user interfaces is parsing: converting an unstructured
sequence of commands, in a format usually determined more by
psychology than by solid engineering, into structured data.

When another programmer wants to talk to a user interface, he has to
quote: convert his structured data into an unstructured sequence of
commands that the parser will, he hopes, convert back into the
original structured data.

This situation is a recipe for disaster. The parser often has bugs: it
fails to handle some inputs according to the documented interface. The
quoter often has bugs: it produces outputs that do not have the right
meaning. Only on rare joyous occasions does it happen that the parser
and the quoter both misinterpret the interface in the same way.

When the original data is controlled by a malicious user, many of
these bugs translate into security holes.

In this case, I interpolated user data without quoting, and suffered
the consequences.

(Yes, my system was used to send out penis enlargement spam. Oh, the
embarrassment.)

The address contained many lines of data, separated by CRNL, and a
complete message header. Interpolated into the subscription message,
the bcc: line caused the qmail-inject user
ineterface program to add all the "bcc" addresses to the outbound
recipient list.

Several thoughts occur to me about this.

User interfaces and programmatic interfaces

The problem would probably not have occurred had I used the
qmail-queue progam, which provides a programmatic interface,
rather than qmail-inject, which provides a user interface. I
originally selected qmail-inject for convenience: it
automatically generates Date and Message-ID fields,
for example. The qmail-queue program does not try to parse
the recipient information from the message header; it takes recipient
information in an out-of-band channel.

Perl piped open is deficient

Perl's system and exec functions have two modes.
One mode looks like this:

system "command arg arg arg...";

If the argument string contains shell metacharacters or certain other
constructions, Perl uses the shell to execute the command; otherwise
it forks and execs the command directly. The shell is the cause of
all sorts of parsing-quoting problems, and is best avoided in programs
like this one. But Perl provides an alternative:

system "command", "arg", "arg", "arg"...;

Here Perl never uses the shell; it always forks and execs the command
directly. Thus, system "cat *" prints the contents of all
the files in the current working directory, but system "cat",
"*" prints only the contents of the file named "*", if
there is one.

qmail-inject has an option to take the envelope information from an out-of-band
channel: you can supply it in the command-line arguments. I did not
use this option in the original program, because I did not want to
pass the user input through the Unix shell, which is what Perl's
open FH, "| command args..." construction would have required.

Unfortunately, there is no easy way to avoid the shell when running a command
that is attached to the parent process via a pipe. Perl provides
open "| command arg arg arg...", which is what I used, and
which is analogous to the first construction, involving the shell.
But it provides nothing analogous to the second construction, which
avoids the shell. If it did, then I probably would have used it,
writing something like this:

open M, "|", $MAILER, "-fnobody\@plover.com", $addre;

and the whole problem would have been avoided.

A better choice would have been to set up the pipe myself and use
Perl's exec function to execute qmail-inject, which bypasses the
shell. The qmail-inject program would always have received exactly one
receipient address argument. In the event of an attack like the one
above, it would have tried to send mail to
into9507@plover.com^M^JContent-Transfer-Encoding:..., which
would simply have bounced.

Why didn't I do this? Pure laziness.

qmail-queue more vulnerable than qmail-inject in this instance

Rather strangely, an partial attack is feasible with qmail-queue, even
though it provides a (mostly) non-parsing interface. The addresses to
qmail-queue are supplied to it on file descriptor 1 in the form:

If my program were to talk to qmail-queue instead of to qmail-inject, the program would have
contained code that looked like this:

print QMAIL_QUEUE_ENVELOPE "T$addre\0";

qmail-queue parses only to the extent of dividing up its input at the ^@
characters. But even this little bit of parsing is a problem.
By supplying an appropriately-formed address string, a malicious user
could still have forced my program to send mail to many addresses.

But still the recipient addresses would have been out of the content
of the message. If the malicious user is unable to affect the content
of the message body, the program is not useful for spamming.

But using qmail-queue, my program would have had to generate the To
field itself, and so it would have had to insert the user-supplied
address into the content of the message. This would have opened the
whole can of worms again.

My program attacked specifically

I think some human put real time into attacking this particular
program. There are bots that scour the web for email submission
forms, and then try to send spam through them. Those bots don't
successfully attack this program, because the recipient address is
hard-wired. Also, the program refuses to send email unless at least
one of the checkboxes is checked, and form-spam bots don't typically
check boxes. Someone had to try some experiments to get the input
format just so. I have logs of the experiments.

A couple of days after the exploit was discovered, a botnet started
using it to send spam; 42 different IP addresses sent requests. I
fixed the problem last night around 22:30; there were about 320 more
requests, and by 09:00 this morning the attempts to send spam
stopped.

Perl's "taint" feature would not have prevented this

Perl offers a feature that is designed specifically for detecting and
preventing exactly this sort of problem. It tracks which data are
possibly under control of a malicious user, and whether they are used
in unsafe operations. Unsafe operations include most file and process
operations.

One of my first thoughts was that I should have enabled the tainting
feature to begin with. However, it would not have helped in this case.
Although the user-supplied address would have been flagged as
"tainted" and so untrustworthy; by extension, the email message string
into which it was interpolated would have been tainted. But Perl does
not consider writing a tainted string to a pipe to be an "unsafe"
operation and so would not have signalled a failure.

Addresses are not allowed to contain whitespace, except leading or
trailing whitespace, which is ignored. Since whitespace inside an
address is unlikely to be an innocent mistake, the program waits
before responding, to slow down the attacker.

Summary

[ Note: because this article is in the oops
section of my blog, I intend that you understand it as a
description of a mistake that I have made. ]

Abhijit Menon-Sen wrote to me to ask for advice in finding the
smallest triangular number that has at least 500 divisors. (That is,
he wants the smallest n such that both n = (k2 +
k)/2 for some integer k and also ν(n) ≥
500, where ν(n) is the number of integers that divide
n.) He said in his note that he believed that brute-force
search would take too long, and asked how I might trim down the
search.

The first thing that occurred to me was that ν is a multiplicative
function, which means that ν(ab) =
ν(a)ν(b) whenever a and b are
relatively prime. Since n and n-1 are relatively prime,
we have that ν(n(n-1)) =
ν(n)·ν(n-1), and so if T is triangular,
it should be easy to calculate ν(T). In particular, either
n is even, and ν(T) = ν(n/2)·ν(n-1),
or n is odd, and ν(T) =
ν(n)·ν((n-1)/2).

So I wrote a program to run through all possible values of n,
calculating a table of ν(n), and then the corresponding
ν(n(n-1)/2), and then stopping when it found one with sufficiently
large ν.

There may be some clever way to quickly calculate ν(n) in
general, but I don't know it. But if you have the prime factorization
of n, it's easy: if n =
p1a1p2a2...
then ν(n) = (a1 + 1)(a2
+ 1)... . This is a consequence of the multiplicativity of ν and
the fact that ν(pn) is clearly
n+1. Since I expected that n wouldn't get too big, I
opted to factor n and to calculate ν from the prime
factorization:

This found the answer, 76,576,500, in about a minute and a half.
(76,576,500 = 1 + 2 + ... + 12,375, and has 576 factors.) I sent this
off to Abhijit.

I was rather pleased with myself, so I went onto IRC to boast about my
cleverness. I posed the problem, and rather than torment everyone
there with a detailed description of the mathematics, I just said
that I had come up with some advice about how to approach the problem
that turned out to be good advice.

A few minutes later one of the gentlemen on IRC, who goes by "jeek",
(real name T.J. Eckman) asked me if 76,576,500 was the right answer.
I said that I thought it was and asked how he'd found it. I was
really interested, because I was sure that jeek had no idea that ν
was multiplicative or any of that other stuff. Indeed, his answer was
that he used the simplest possible brute force search. Here's jeek's
program:

In this program, the variable $y holds the current triangular
number. To calculate ν(y), this program just counts
$z from 1 up to √y, incrementing a counter every
time it discovers that z is a divisor of y. If the
counter exceeds 499, the program prints y and stops. This takes
about four and a half minutes.

It takes three times as long, but uses only one-third the code.
Beginners may not see this as a win, but it is a huge win. It
is a lot easier to reduce run time than it is to reduce code size. A
program one-third the size of another is almost always better—a
lot better.

In this case, we can trim up some obvious inefficiencies and make the
program even smaller. For example, the tests here can be omitted:

if (($y/$z) != ($z)) { $r++; }

It can yield false only if
y is the square of z. But y is triangular, and
triangular numbers are never square. And we can optimize away the
repeated square root in the loop test, and use a cheaper and simpler
$y % $z == 0 divisibility test in place of the complicated one.

The program is now one-fifth the size of mine and runs in 75
seconds. That is, it is now smaller and faster than mine.

This shows that jeek's approach was the right one and mine was wrong,
wrong, wrong. Simple programs are a lot easier to speed up than
complicated ones. And I haven't even consider the cost of the time I
wasted writing my complicated program when I could have written Jeek's
six-liner that does the same thing.

So! I had to write back to my friend to tell him that my good advice
was actually bad advice.

[ Addendum 20070405: Robert Munro pointed out an error in the final
version of the program, which I have since corrected. ]

[ Addendum 20070405: I said that triangular numbers are never square,
which is manifestly false, since 1 is both, as is 36. I should have
remembered this, since some time in the past three years I
investigated this exact question and found the answer. But it hardly
affects the program. The only way it could cause a failure is if
there were a perfect square triangular number with exactly 499
factors; such a number would be erroneously reported as having 500
factors instead. But the program reports a number with 576 factors
instead. Conceivably this could actually be a misreported perfect
square with only 575 factors, but regardless, the reported number is
correct. ]

Symmetric functions
I used to teach math at the John Hopkins CTY program, which is a
well-regarded summer math camp. Kids would show up and finish a year
(or more) of high-school math in three weeks. We'd certify them by
giving them standardized tests, which might carry some weight with
their school. But before they were allowed to take the standardized
test, they had to pass a much more difficult and comprehensive exam
that we'd made up ourselves.

The most difficult question on the Algebra III exam presented the
examinee with some intractable third degree polynomial—say
x3 + 4x2 - 2x + 6—and asked for the sum
of the cubes of its roots.

You might like to match your wits against the Algebra III students
before reading the solution below.

In the three summers I taught, only about two students were able to
solve this problem, which is rather tricky. Usually they would start
by trying to find the roots. This is doomed, because the
Algebra III course only covers how to find the roots when they are
rational, and the roots here are totally bizarre.

Even clever students didn't solve the problem, which required several
inspired tactics. First you must decide to let the roots be p,
q, and r, and, using Descartes' theorem, say that

x3 + bx2 +
cx + d
=
(x - p)(x -
q)(x - r)

This isn't a hard thing to do, and a lot of the kids probably did try it,
but it's not immediately clear what the point is, or that it will get
you anywhere useful, so I think a lot of them never took it any farther.

But expanding the right-hand side of the equation above yields:

x3 + bx2 +
cx + d
=
x3 -
(p + q + r)x2 +
(pq + pr + qr)x -
pqr

And so, equating coefficients, you have:

b

=

-(p + q + r)

c

=

pq + pr + qr

d

=

-pqr

Quite a few people did get to this point, but didn't know what to do
next. Getting the solution requires either a bunch of patient
tinkering or a happy inspiration, and either way it involves a large
amount of accurate algebraic manipulation. You need to realize
that you can get the p3 terms by cubing b. But even if
you have that happy idea, the result is:

-b3

=

p3 + q3 + r3
+ 3p2q
+ 3p2r
+ 3q2r
+ 3pq2
+ 3pr2
+ 3qr2
+ 6pqr

And you now need to figure out how to get rid of the unwanted terms. The
6pqr term is not hard to eliminate, since it is
just -6d, and if you notice this, it will probably inspire you
to try combinations of the others. In fact, the answer is:

p3 + q3 + r3

=

-b3 + 3bc - 3d

So for the original polynomial,
x3 + 4x2 - 2x + 6,
we know that the sum of the cubes of the roots is
-43 + 3·4·(-2) - 3·6 =
-64 - 24 - 18 = -106, and we calculated it without any idea what the
roots actually were.

This was a lot of algebra III, but once you have seen this example,
it's not hard to solve a lot of similar problems. For instance, what
is the sum of the squares of the roots of
x2 + bx + c? Well, proceeding as before, we let
the roots be p and q, so
x2 + bx + c = (x - p)(x -
q) = x2 - (p + q)x + pq, so
that b = -(p + q) and c = pq. Then
b2 = p2 + 2pq+ q2, and
b2 - 2c = p2 + q2.

In general, if F is any symmetric function of the roots of a
polynomial, then F can be calculated from the
coefficients of the polynomial without too much difficulty.

Anyway, I was tinkering around with this at breakfast a couple of days
ago, and I got to thinking about b2 - 2c = p2 +
q2. If roots p and q are both integers, then
b2 - 2c is the sum of two squares. (The
sum-of-two-squares theorem is one of my favorites.) And the roots are
integers only when the discriminant of the original polynomial is
itself a square. But the discriminant in this case is b2 -
4c. So we have the somewhat odd-seeming statement that when
b2 - 4c is a square, then b2 - 2c is a
sum of two squares.

I found this surprising because it seemed so underconstrained: it says
that you can add some random even number to a fairly large class of
squares and the result must be a sum of two squares, even if the even
number you added wasn't a square itself. But after I tried a few
examples to convince myself I hadn't made a mistake, I was sure there
had to be a very simple, direct way to get to the same place.

It took some fiddling, but eventually I did find it. Say that
b2 - 4c = a2. Then b and a must have
the same parity, so p = (b + a)/2 is an integer,
and we can write b = p + q and a =
p - q where p and q are both integers.

Then c = (b2 - a2)/4
is just pq, and b2 - 2c =
p2 + q2.

So that's where that comes from.

It seems like there ought to be an interesting relationship between
the symmetric functions of roots of a polynomial and their expression
in terms of the coefficients of the polynomial. The symmetric
functions of degree N are all linear combinations of a finite
set of symmetric functions. For example, any second-degree symmetric
function of two variables has the form
a(p2 +
q2) +
2bpq. We can denote these basic symmetric functions of
two variables as Fi,j(p,
q) = Σpiqj. Then we have identities like
(F1,0)2 =
F2,0 + F1,1
and
(F1,0)3 =
F3,0 + 3F2,1.

How big is a five-gallon jug?
Office water coolers in the United States commonly take five-gallon
jugs of water. You are probably familiar with these jugs, but here is
a picture of a jug, to refresh your memory. A random graduate student
has been provided for scale:

Here's today's riddle: Can you estimate the volume of the jug in
cubic feet? "Estimate" means by eyeballing it, not by calculating,
measuring,
consulting reference works, etc. But feel free to look at an actual
jug if you have one handy.

Once you've settled on your estimate, compare it with the correct
answer, below.

Hard to believe, isn't it? ("Strange but true.") I took one of these
jugs around my office last year, asking everyone to guess how big it
was; nobody came close. People typically guessed that it was about
three times as big as it actually is.

This puzzle totally does not work anywhere except in the United
States. The corresponding puzzle for the rest of the world is "Here
is a twenty-liter jug. Can you guess the volume of the jug in
liters?" I suppose this is an argument in favor of the metric
system.

Your age as a fraction
Little kids often report their ages as "two and a half" or sometimes
even "three and three quarters". These evaluations are usually based
on whole months: if you were born on April 2, 1969, then on October
2, 1971 you start reporting your age as "two and a half", and, if you
choose to report your age as "three and three quarters", you
conventionally may begin on January 2, 1973.

However, these reports are not quite accurate. On January 2, 1973,
exactly 3 years and 9 months from your birthday, you would be 1,371
days old, or 3 years plus 275 days. 275/365 = 0.7534. On January 1,
you were only 3 + 274/365 days old, which is 3.7507 years, and so
January 1 is the day on which you should have been allowed to start
reporting your age as "three and three quarters". This slippage
between days and months occurs in the other direction as well, so
there may be kids wandering around declaring themselves as "three and
a half" a full day before they actually reach that age.

Clearly this is one of the major problems facing our society,
so I wanted to make up a table showing, for each number of days
d from 1 to 365, what is the simplest fraction
a/b such that when it is d days after your
birthday, you are (some whole number and) a/b years.
That is, I wanted a/b such that d/365 ≤
a/b < (d+1)/365.

Then, by consulting the table each day, anyone could find out what new
fraction they might have qualified for, and, if they preferred the new
fraction to the old, they might start reporting their age with that
fraction.

There is a well-developed branch of mathematics that deals with this
problem. To find simple fractions that approximate any given rational
number, or lie in any range, we first expand the bounds of the range
in continued fraction form. For example, suppose it has been 208 days
since your birthday. Then today your age will range from y
plus 208/365 years up to y plus 209/365 years.

Then we expand 208/365 and 209/365 as continued fractions:

208/365 = [0; 1, 1, 3, 12, 1, 3]
209/365 = [0; 1, 1, 2, 1, 16, 1, 2]

Where [0; 1, 1, 3, 12, 1, 3] is an abbreviation for the
typographically horrendous expression:

Then you need to find a continued fraction that lies numerically in
between these two but is as short as possible. (Shortness of
continued fractions corresponds directly to simplicity of the rational
numbers they represent.) To do this, take the common initial segment,
which is [0; 1, 1], and then apply an appropriate rule for the next
place, which depends on whether the numbers in the next place differ
by 1 or by more than 1, whether the first difference occurs in an even
position or an odd one, mumble mumble mumble; in this case the rules
say we should append 3. The result is [0; 1, 1, 3], or, in
conventional notation:

Then I came to my senses, and realized there was a better way, at
least for the cases I wanted to calculate. Given d, we want to
find the simplest fraction a/b such that d/365
≤ a/b < (d+1)/365. Equivalently, we want
the smallest integer b such that there is some integer a
with db/365 ≤ a < (d+1)b/365. But
b must be in the range (2 .. 365), so we can easily calculate
this just by trying every possible value of b, from 2 on up:

The fussing with ceil() in the main test is to make the
ranges open on the upper end: 2/5 is not in the range
[3/10, 4/10), but it is in the range
[4/10, 5/10). Then we can embed this in a simple report-printing
program:

The simplest fractions that are missing from this table are 1/5, which
is in the [2/10, 3/10) range and is beaten out by 1/4, and 3/5, which is
in the [6/10, 7/10) range and is beaten out by 2/3.

This works fine, and it is a heck of a lot simpler than all the
continued fraction stuff. The more so because the continued fraction
library is written in C.

For the application at hand, an alternative algorithm is to go through
all fractions, starting with the simplest, placing each one into the
appropriate d/365 slot, unless that slot is already filled by a
simpler fraction:

(It might also be that you could optimize the algorithms to go faster:
you can skip the body of the loop in the slot-filling algorithm
whenever $n and $d have a common factor, which means
you are executing the body only n log n times. But
testing for common factors takes time too...)

I was going to paste in a bunch of tabulations, but once again I
remembered that it makes more sense to just let you run the program
for yourself. Here is a form that will generate the table for all the
fractions 1/N .. (N-1)/N; use N=365 to generate a
table of year fractions for common years, and N=366 to
generate the table for leap years:

Here's a program that will take your birthday and calculate your age
in fractional years. Put in your birthday in ISO standard format: 2
April, 1969 is 19690402.

The subject of really narrow buildings came up on Reddit last week,
and my post about the "Spite House" was well-received. Since pictures
of it seem to be hard to come by, I scanned the pictures from
New York's Architectural Holdouts by Andrew Alpern and
Seymour Durst.

The book is worth checking out, particularly if you are familiar with
New York. The canonical architectural holdout occurs when a developer
is trying to assemble a large parcel of land for a big building, and a
little old lady refuses to sell her home. The book is full of
astonishing pictures: skyscrapers built with holdout buildings
embedded inside them and with holdout buildings wedged underneath
them. Skyscrapers built in the shape of the letter E (with the
holdouts between the prongs), the letter C (with the holdout in the
cup), and the letter Y (with the holdout in the fork).

Photo credit: Jerry Callen

When
Henry Siegel, a New York store owner, got news in 1898 that Macy's was
going to build a gigantic new flagship store on Herald Square, he
bought the corner lot for $375,000 to screw over his competitors. The
Herald Square Macy's still has a notch cut out of its corner; see the
picture at right. The
Macy's store on Queens Boulevard is in the shape of a perfect circle,
except for the little bit cut out of one side where the proverbial old
lady (this time named Mary Sendek) refused to sell a 7×15-foot
back corner of her lot for $200,000 because she wanted her dog to have
a place to play. (Here's
a satellite view of the building. The notch is clearly visible at the
northwest corner, facing 55th Aveue.)

But
anyway, the Spite House. The story, as told by Alpern and Durst, is
that around 1882, Patrick McQuade wanted to build some houses on 82nd
Street at Lexington Avenue. The adjoining parcel of land, around the
corner on Lexington, was owned by Joseph Richardson, shown at left.
If McQuade could acquire this parcel, he would be able to extend his
building all the way to Lexington Avenue, and put windows on that side
of the building. No problem: the parcel was a strip of land 102 feet
long and five feet wide along Lexington, useless for any other
purpose. Surely Richardson would sell.

McQuade offered $1,000, but Richardson demanded $5,000. Unwilling to
pay, McQuade started building his houses anyway, complete with windows
looking out on Richardson's five-foot-wide strip, which was
unbuildable. Or so he thought.

Richardson built a building five
feet wide and 102 feet long, blocking McQuade's Lexington Avenue
windows. (Click the pictures for large versions.)

The building soon became known as the "Spite House". The photograph
above was taken around 1895. Lexington Avenue is torn up for
maintenance in this picture.

Richardson took advantage of a clause in the building codes that
allowed him to build bay window extensions in his building. This
allowed him to extend its maximum width 2'3" beyond the boundary of
the lot. (Alpern and Durst say "In those days, such encroachments on
the public sidewalks were not prohibited.") The rooms of the Spite
House were in these bay window extensions, connected by extremely
narrow hallways:

As you can see, the Spite House was divided into two dwellings, each
with a separate entrance, four floors, and two rooms on each floor.
The rooms were 7'3" wide and were connected by hallways 3'4" wide.

After construction was completed, Richardson moved into the Spite
House and lived there until he died in 1897. The pictures below and
at left are from that time.

The edge-on photograph below, showing the Spite House's 3'4" frontage
on 82nd Street, was taken in 1912.

The Spite House was demolished in 1915.

Picture credits

All other pictures and photographs are in the public domain. I took
them from pages 122–124 of the book New York's Architectural
Holdouts, by Alpern and Durst. The original sources, as given
by Alpern and Durst, are as follows:

Bernoulli processes
A family has four children. Assume that the sexes of the four
children are independent, and that boys and girls are equiprobable.
What's the most likely distribution of boys and girls?

Well,it depends how you count. Are there three possibilities or
five?

All four the same

Three the same, one different

Two-and-two

Four boys, no girls

Three boys, one girl

Two boys, two girls

One boy, three girls

No boys, four girls

If we group outcomes into five categories, as in the pink division
on the right, the most likely distribution is two-and-two, as you
would probably guess:

Boys

Girls

Probability

0

4

0.0625

1

3

0.25

2

2

0.375

3

1

0.25

4

0

0.0625

This distribution is depicted in the graph at right.
Individually, (3, 1) and (1, 3) are less likely than (2, 2). But
"three-and-one" includes both (1, 3) and (3, 1), whereas "two-and-two"
includes only (2, 2). So if you group outcomes into three categories,
as in the green division above left, "three-and-one" comes out more frequent
overall than "two-and-two":

One sex

The other

Totalprobability

4

0

0.125

3

1

0.5

2

2

0.375

It makes a difference whether you specify the sexes in the
distribution. If a "distribution" is a thing like "b of the
children are boys and g are girls", then the most frequent
distribution is (2, 2). But if a distribution is "x of one sex
and y of the other", then the most frequent distribution [3, 1],
where I've used square brackets to show that the order is not
important. [3, 1] is the same as [1, 3].

This is true in general. Suppose someone has 1,000 kids. What's the
most likely distribution of sexes? It's 500 boys and 500 girls, which
I've been writing (500, 500). This is more likely than either (499,
501) or (501, 499). But if you consider "Equal numbers" versus
"501-to-499", which I've been writing as [500, 500] and [501, 499],
then [501, 499] wins:

Boys

Girls

Probability

501

499

0.02517

500

500

0.02522

499

501

0.02517

One sex

The other

Totalprobability

501

499

0.05035

500

500

0.02522

For odd numbers of kids, this anomaly doesn't occur, because there's no
symmetric value like [500, 500] to get shorted.

Distribution

Numberof hands

Frequency

[4, 4, 3, 2]

10810800

0.16109347

[5, 4, 3, 1]

8648640

0.12887478

[5, 3, 3, 2]

8648640

0.12887478

[5, 4, 2, 2]

6486480

0.09665608

[4, 3, 3, 3]

4804800

0.07159710

[6, 4, 2, 1]

4324320

0.06443739

[6, 3, 2, 2]

4324320

0.06443739

[6, 3, 3, 1]

2882880

0.04295826

[5, 5, 2, 1]

2594592

0.03866243

[7, 3, 2, 1]

2471040

0.03682137

[4, 4, 4, 1]

1801800

0.02684891

[6, 4, 3, 0]

1441440

0.02147913

[5, 4, 4, 0]

1081080

0.01610935

[6, 5, 2, 0]

864864

0.01288748

[6, 5, 1, 1]

864864

0.01288748

[5, 5, 3, 0]

864864

0.01288748

[7, 4, 2, 0]

617760

0.00920534

[7, 4, 1, 1]

617760

0.00920534

[7, 2, 2, 2]

617760

0.00920534

[8, 2, 2, 1]

463320

0.00690401

[7, 3, 3, 0]

411840

0.00613689

[8, 3, 2, 0]

308880

0.00460267

[8, 3, 1, 1]

308880

0.00460267

[7, 5, 1, 0]

247104

0.00368214

[8, 4, 1, 0]

154440

0.00230134

[6, 6, 1, 0]

144144

0.00214791

[9, 2, 1, 1]

102960

0.00153422

[9, 3, 1, 0]

68640

0.00102282

[9, 2, 2, 0]

51480

0.00076711

[10, 2, 1, 0]

20592

0.00030684

[7, 6, 0, 0]

20592

0.00030684

[8, 5, 0, 0]

15444

0.00023013

[9, 4, 0, 0]

8580

0.00012785

[10, 1, 1, 1]

6864

0.00010228

[10, 3, 0, 0]

3432

0.00005114

[11, 1, 1, 0]

1872

0.00002789

[11, 2, 0, 0]

936

0.00001395

[12, 1, 0, 0]

156

0.00000232

[13, 0, 0, 0]

4

0.00000006

Similar behavior appears in related problems. What's the most likely
distribution of suits in a bridge hand? People often guess (4, 3, 3,
3), and this is indeed the most likely distribution of
particular suits. That is, if you consider distributions of
the form "a hearts, b spades, c diamonds, and
d clubs", then (4, 3, 3, 3) gives the most likely distribution.
(The distributions (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4) are of
course equally frequent.) But if distributions have the form
"a cards of one suit, b of another, c of another,
and d of the fourth"—which is what is usually meant by a
suit distribution in a bridge hand—then [4, 4, 3, 2] is the most likely
distribution, and [4, 3, 3, 3] is in fifth place.

This same issue arose in my
recent article on Yahtzee roll probabilities. There we had six
"suits", which represented the six possible rolls of a die, and I
asked how frequent each distribution of "suits" was when five dice
were rolled. For distribution [p1,
p2, ...], we let ni be the
number of p's that are equal to i. Then the expression
for probability of the distribution has a factor of in the denominator, with the result that
distributions with a lot of equal-sized parts tend to appear less
frequently than you might otherwise expect.

I'm not sure how I got so deep into this end of the subject, since I
didn't really want to compare complex distributions to each other so
much as to compare simple distributions under different conditions. I
had originally planned to discuss the World Series, which is a
best-four-of-seven series of baseball games that we play here in the
U.S. and sometimes in that other country to the north. Sometimes one
team wins four games in a row ("sweeps"); other times the Series runs
the full seven games.

You might expect that even splits would tend to occur when the two
teams playing were evenly matched, but that when one team was much
better than the other, the outcome would be more likely to be a sweep.
Indeed, this is generally so. The chart below graphs the possible
outcomes. The x-axis represents the probability of the
Philadelphia Phillies winning any individual game. The y-axis
is the probability that the Phillies win the entire series (red line),
which in turn is the sum of four possible events: the Phillies win in
4 games (green), in 5 games (dark blue), in 6 games (light blue), or
in 7 games (magenta). The probabilities of the Nameless Opponents
winning are not shown, because they are exactly the opposite. (That
is, you just flip the whole chart horizontally.)

Clearly, the Phillies have a greater-than-even chance of winning the
Series if and only if they have a greater-than-even chance of winning
each game. If they are playing a better team, they are likely to
lose, but if they do win they are most likely to do so in 6 or 7
games. A sweep is the most likely outcome only if the Opponents are
seriously overmatched, and have a less than 25% chance of winning each
game. (The lines for the 4-a outcome and the 4-b
outcome cross at 1-(pa /
pb)1/(b-a), where
pi is 1, 4, 10, 20 for i = 0, 1, 2, 3.)

If we consider just the first four games of the World Series, there
are five possible outcomes, ranging from a Phillies sweep, through a
two-and-two split, to an Opponents sweep. Let p be the
probability of the Phillies winning any single game.
As p increases, so does the likelihood of a Phillies sweep.
The chart below plots the likelihood of each of the five possible
outcomes, for various values of p, charted here on the
horizontal axis:

The leftmost red curve is the probability of an Opponents sweep; the
red curve on the right is the probability of a Phillies sweep. The
green curves are the probabilities of 3-1 outcomes favoring the
Opponents and the Phillies, respectively, with the Phillies on the
right as before. The middle curve, in dark blue, is the probability
of a 2-2 split.

When is the 2-2 split the most likely outcome? Only when the Phillies
and the Opponents are approximately evenly matched, with neither team
no more than 60% likely to win any game.

But just as with the sexes of the four kids, we get a different result
if we consider the outcomes that don't distinguish the teams. For the
first four games of the World Series, there are only three outcomes: a
sweep (which we've been writing [4, 0]), a [3, 1] split, and a [2, 2]
split:

Here the green lines in the earlier chart have merged into a single
outcome; similarly the red lines have merged. As you can see from the
new chart, there is no pair of teams for which a [2, 2] split
predominates; the even split is buried. When one team is grossly
overmatched, winning less than about 19% of its games, a sweep is the
most likely outcome; otherwise, a [3, 1] split is most likely.

Here are the corresponding charts for series of various lengths.

Series length(games)

Distinguish teams

Don'tdistinguish teams

2

3

4

5

6

7

8

9

10

I have no particular conclusion to announce about this; I just thought
that the charts looked cool.

Coming later, maybe: reasoning backwards: if the Phillies sweep the
World Series, what can we conclude about the likelihood that they are
a much better team than the Opponents? (My suspicion is that you can
conclude a lot more by looking at the runs scored and runs allowed
totals.)

(Incidentally, baseball players get a share of the ticket money for
World Series games, but only for the first four games. Otherwise,
they could have an an incentive to prolong the series by playing less
well than they could, which is counter to the ideals of sport. I find
this sort of rule, which is designed to prevent conflicts of interest,
deeply satisfying.)

Here's one interesting fact: it's quite easy to calculate the number
of partitions of N. Let P(n, k) be the
number of partitions of n into parts that are at least
k. Then it's easy to see that:

$$P(n, k) =
\sum_{i=k}^{n-1} P(n-i, k)$$

And there are simple boundary conditions: P(n, n)
= 1; P(n, k) = 0 when k > n, and
so forth. And P(n), the number of partitions of
n into parts of any size, is just P(n, 1). So a
program to calculate P(n) is very simple:

I had a funny conversation once with someone who ought to have known
better: I remarked that it was easy to calculate P(n),
and disagreed with me, asking why Rademacher's
closed-form expression for P(n) had been such a
breakthrough. But the two properties are independent; the same is
true for lots of stuff. Just because you can calculate something
doesn't mean you understand it. Calculating ζ(2) is quick and
easy, but it was a major breakthrough when Euler discovered that it
was equal to π2/6. Calculating ζ(3) is even
quicker and easier, but nobody has any idea what the value
represents.

Similarly, P(n) is easy to calculate, but harder to
understand. Ramanujan observed, and proved, that
P(5k+4) is always a multiple of 5, which had somehow
escaped everyone's notice until then. And there are a couple of other
similar identities which were proved later: P(7k+5)
is always a multiple of 7; P(11k+6) is always a
multiple of 11. Based on that information, any idiot could conjecture
that P(13k+7) would always be a multiple of 13; this
conjecture is wrong. (P(7) = 15.)

Anyway, all that is just leading up the the real point of this note,
which is that I was tabulating the number of partitions of n
into exactly k parts, which is also quite easy. Let's call this
Q(n, k). And I discovered that Q(13, 4)
= Q(13, 5). There are 18 ways to divide a pile of 13 beans
into 4 piles, and also 18 ways to divide the beans into 5 piles.

1 1 1 10

1 1 2 9

1 1 3 8

1 1 4 7

1 1 5 6

1 2 2 8

1 2 3 7

1 2 4 6

1 2 5 5

1 3 3 6

1 3 4 5

1 4 4 4

2 2 2 7

2 2 3 6

2 2 4 5

2 3 3 5

2 3 4 4

3 3 3 4

1 1 1 1 9

1 1 1 2 8

1 1 1 3 7

1 1 1 4 6

1 1 1 5 5

1 1 2 2 7

1 1 2 3 6

1 1 2 4 5

1 1 3 3 5

1 1 3 4 4

1 2 2 2 6

1 2 2 3 5

1 2 2 4 4

1 2 3 3 4

1 3 3 3 3

2 2 2 2 5

2 2 2 3 4

2 2 3 3 3

The question I'm trying to resolve: is this just a coincidence? Or is
there something in the structure of the partitions that would lead us
to suspect that Q(13, 4) = Q(13, 5) even if we didn't
know the value of either one?

So far, I haven't turned anything up; it seems to be a coincidence. A
simpler problem of the same type is that
Q(8, 3)
= Q(8, 4); that seems to be a coincidence too:

1 1 6

1 2 5

1 3 4

2 2 4

2 3 3

1 1 1 5

1 1 2 4

1 1 3 3

1 2 2 3

2 2 2 2

Looking at this, one can see all sorts of fun correspondences. But on
closer inspection, they turn out to be illusory. For example, any
partition into 4 parts can be turned into a partition into 3 parts by
taking the smallest of the 4 parts, dividing it up into 1's, and
distributing the extra 1's to the largest parts. But there's no
reason why that should always yield different outputs for different
inputs, and, indeed, it doesn't.

"Go ahead, throw your vote away!"
I noticed this back in November right afer the election, when I was
reading the election returns in the newspaper. There were four
candidates for the office of U.S. Senator in Nevada. One of these was
Brendan Trainor, running for the Libertarian party.

Trainor received a total of 5,269 votes, or 0.90% of votes cast.

A fifth choice, "None of these candidates", was available. This
choice received 8,232 votes, or 1.41%.

Another candidate, David Schumann, representing the Independent
American Party, was also defeated by "None of these candidates".

I'm not sure what conclusion to draw from this. I am normally
sympathetic to the attempts of independent candidates and small
parties to run for office, and I frequently vote for them. But when
your candidate fails to beat out "None of the above", all I can think
is that you must be doing something terribly wrong.

Someone must be referring people to these incorrect URLs, and it is
presumably me. The HTML version of the blog looked okay, so I checked
the RSS and Atom files, and found that, indeed, they were malformed.
Instead of <img src="foo.gif" alt="$TeX$">, they
contained codes for <img src="foo.gif$TeX$">.

I tracked down and fixed the problem. Usually when I get a bug like
this, I ask myself what I could learn from it. This one is unusual.
I can't think of much. Here's the bug.

The <img> element is generated by a function
called imglink. The arguments to imglink are the
filename that contains the image (for use in the SRC attribute) and
the text for the ALT attribute. The ALT text is optional. If it is
omitted, the function tries to locate the TeX source code and fetch
it. If this attempt fails, it continues anyway, and omits the ALT
attribute. Then it generates and returns the HTML:

This function is called from several places in the plugin. Sometimes
the TeX source code is available at the place from which the call
comes, and the code has return imglink($file, $tex);
sometimes it isn't and the code has
return imglink($file) and hopes that the imglink
function can retrieve the TeX.

One such place is the branch that handles generation of tags for every
type of output except HTML. When generating the HTML output, the
plugin actually tries to run TeX and generate the resulting image
file. For other types of output, it assumes that the image file is
already prepared, and just calls imglink to refer to an image
that it presumes already exists:

return imglink($file, $tex) unless $blosxom::flavour eq "html";

The bug was that I had written this instead:

return imglink($file. $tex) unless $blosxom::flavour eq "html";

The . here is a string concatenation operator.

It's a bit surprising that I don't make more errors like this than I
do. I am a very inaccurate typist.

Stronger type checking would not have saved me here. Both arguments
are strings, concatenation of strings is perfectly well-defined, and
the imglink function was designed and implemented to accept
either one or two arguments.

The function did note the omission of the $tex argument,
attempted to locate the TeX source code for the bizarrely-named file,
and failed, but I had opted to have it recover and continue silently.
I still think that was the right design. But I need to think about
that some more.

The only lesson I have been able to extract from this so far is that I
need a way of previewing the RSS and Atom outputs before publishing
them. I do preview the HTML output, but in this case it was
perfectly correct.

It's equivalent to the following simple algebraic proof: if a/b
is the "simplest" integer ratio equal to √2 then consider
(2b-a)/(a-b), which a little manipulation shows is also
equal to √2 but has smaller numerator and denominator,
contradiction.

According to Cut-the-knot,
the proof was anticipated in 1892 by A. P. Kiselev and appeared on
page 121 of his book Geometry.

A polynomial trivium
A couple of months ago I calculated the following polynomial—I
forget why—and wrote
it on my whiteboard. I want to erase the whiteboard, so I'm recording
the polynomial here instead.

A new proof that the square root of 2 is irrational
Last week I ran into this totally brilliant proof that √2 is
irrational. The proof was discovered by Tom M. Apostol, and was
published as
"Irrationality of the Square Root of Two - A Geometric Proof"
in the American Mathematical Monthly, November
2000, pp. 841–842.

In short, if √2 were rational, we could construct an isosceles right
triangle with integer sides. Given one such triangle, it is possible
to construct another that is smaller. Repeating the construction, we
could construct arbitrarily small integer triangles. But this is
impossible since there is a lower limit on how small a triangle can be
and still have integer sides. Therefore no such triangle could exist
in the first place, and √2 is irrational.

In hideous detail: Suppose that √2 is rational. Then by scaling up the
isosceles right triangle with sides 1, 1, and √2 appropriately, we
obtain the smallest possible isosceles right triangle whose sides are
all integers. (If √2 = a/b, where a/b is
in lowest terms, then the desired triangle has legs with length
b and hypotenuse a.) This is ΔOAB in the
diagram below:

By hypothesis, OA, OB, and AB are all integers.

Now construct arc BC, whose center is at A. AC
and AB are radii of the same circle, so AC =
AB,
and thus AC is an integer.
Since OC = OA - CA, OC is also an integer.

Let CD be the perpendicular to OA at point C.
Then ΔOCD is also an isosceles right triangle, so
OC = CD, and CD is an integer.
CD and BD are tangents to the same
arc from the same point D, so CD = BD, and BD is
an integer. Since OB and BD are both integers, so is
OD.

Since OC, CD, and OD are all integers, ΔOCD is another
isosceles right triangle with integer sides, which contradicts the
assumption that OAB was the smallest such.

The thing I find amazing about this proof is not just how simple it
is, but how strongly geometric. The Greeks proved that √2 was
irrational a long time ago, with an argument that was essentially
arithmetical. The Greeks being who they were, their essentially
arithmetical argument was phrased in terms of geometry, with all the
numbers and arithmetic represented by operations on line segments.
The Tom Apostol proof is much more in the style of the Greeks than is
the one that the Greeks actually found!

ALT attributes in formula image elements
I have a Blosxom plugin that recognizes
<formula>...</formula> elements in my blog article files,
interprets the contents as TeX, converts the results to a gif file,
and then replaces the whole thing with an inline image tag to inline
the gif file.

Today I fixed the plugin to leave the original TeX source code in the
ALT attribute of the IMG tag. I should have done this in the first
place.

If any people with vision impairments read my blog and have
suggestions about how I could make it more accesible, I would be very
grateful to hear them.

Yahtzee probability
In the game of Yahtzee, the players roll five dice and try to generate
various combinations, such as five of a kind, or full house (a
simultaneous pair and a three of a kind.) A fun problem is to
calculate the probabilities of getting these patterns. In Yahtzee,
players get to re-roll any or all of the dice, twice, so the probabilities
depend in part on the re-rolling strategy you choose. But the first
step in computing the probabilities is to calculate the chance of
getting each pattern in a single roll of all five dice.

A related problem is to calculate the probability of certain poker
hands. Early in the history of poker, rules varied about whether a
straight beat a flush; players weren't sure which was more common.
Eventually it was established that straights were more common than
flushes. This problem is complicated by the fact that the deck
contains a finite number of each card. With cards, drawing a 6
reduces the likelihood of drawing another 6; this is not true when you
roll a 6 at dice.

With three dice, it's quite easy to calculate the likelihood of
rolling various patterns:

Pattern

Probability

A A A

6

/ 216

A A B

90

/ 216

A B C

120

/ 216

A high school student would have no trouble with this. For pattern
AAA, there are clearly only six possibilities. For pattern AAB, there
are 6 choices for what A represents, times 5 choices for what B
represents, times 3 choices for which die is B; this makes 90. For
pattern ABC, there are 6 choices for what A represents times 5 choices
for what B represents times 4 choices for what C represents; this
makes 120. Then you check by adding up 6+90+120 to make sure you get
63 = 216.

It is perhaps a bit surprising that the majority of rolls of three
dice have all three dice different. Then again, maybe not. In
elementary school I was able to amaze some of my classmates by
demonstrating that I could flip three coins and get a two-and-one
pattern most of the time. Anyway, it should be clear that as the
number of dice increases, the chance of them all showing all different
numbers decreases, until it hits 0 for more than 6 dice.

The three-die case is unusually simple.
Let's try four dice:

Pattern

Probability

A A A A

6

/ 1296

A A A B

120

/ 1296

A A B B

90

/ 1296

A A B C

720

/ 1296

A B C D

360

/ 1296

There are obviously 6 ways to throw the pattern AAAA. For pattern AAAB
there are 6 choices for A × 5 choices for B × 4 choices
for which die is the B = 120. So far this is no different from the
three-die case. But AABB has an added complication, so let's analyze
AAAA and AAAB a little more carefully.

First, we count the number of ways of assigning numbers of pips on the
dice to symbols A, B, and so on. Then we count the number of ways of
assigning the symbols to actual dice. The total is the product of
these. For AAAA there are 6 ways of assigning some number of pips to
A, and then one way of assigning A's to all four dice. For AAAB there
are 6×5 ways of assigning pips to symbols A and B, and then four
ways of assigning A's and B's to the dice, namely AAAB, AABA, ABAA,
and BAAA. With that in mind, let's look at AABB and AABC.

For AABB, There are 6 choices for A and 5 for B, as before. And there
are !!4\choose2!! = 6 choices for which
dice are A and which are B. This would give 6·5·6 = 180
total. But of the 6 assignments of A's and B's to the dice, half are
redundant. Assignments AABB and BBAA, for example, are completely
equivalent. Taking A=2 B=4 with pattern AABB yields the
same die roll as A=4 B=2 with pattern BBAA. So we have double-counted
everything, and the actual total is only 90, not 180.

Similarly, for AABC, we get 6 choices for A × 5 choices for B
× 4 choices for C = 120. And then there seem to be 12 ways of
assigning dice to symbols:

AABC

AACB

ABAC

ACAB

ABCA

ACBA

BAAC

CAAB

BACA

CABA

BCAA

CBAA

But no, actually there are only 6, because B and C are entirely
equivalent, and so the patterns in the left column cover all the
situations covered by the ones in the right column. The total is not
120×12 but only 120×6 = 720.

Then similarly for ABCD we have 6×5×4×3 = 360 ways
of assigning pips to the symbols, and 24 ways of assigning the symbols
to the dice, but all 24 ways are equivalent, so it's really only 1 way
of assigning the symbols to the dice, and the total is 360.

The check step asks if 6 + 120 + 90 + 720 + 360 = 64 = 1296,
which it does, so that is all right.

Before tackling five dice, let's try to generalize. Suppose the we
have N dice and the pattern has k ≤ N distinct
symbols which occur (respectively) p1,
p2, ... pk times each.

There are !!{6\choose k}k!!!
ways to assign the pips to the symbols. (Note for non-mathematicians:
when k > 6, !!{6\choose
k}!! is zero.)

Then there are !!N\choose p_1 p_2
\ldots p_k!! ways to assign the symbols to the dice, where
denotes the so-called multinomial
coefficient, equal to !!{N!\over
p_1!p_2!\ldots p_k!}!!.

But some of those pi might be equal, as with
AABB, where p1 =
p2 = 2, or with AABC, where
p2 =
p3 = 1. In such cases
case some of the
assignments are redundant.

So rather than dealing with the pi directly,
it's convenient to aggregate them into groups of equal numbers. Let's
say that ni counts the number of p's
that are equal to i. Then instead of having
pi = (3, 1, 1, 1, 1) for AAABCDE, we have
ni = (4, 0, 1) because there are 4 symbols
that appear once, none that appear twice, and one ("A") that appears
three times.

We can re-express
in terms of the ni:

$$N!\over
{1!}^{n_1}{2!}^{n_2}\ldots{k}!^{n_k}$$

And the reduced contribution from equivalent patterns is easy to
express too; we need to divide by !!\prod
{n_i}!!!. So we can write the total as:

Note that k, the number of distinct
symbols, is merely the sum of the ni.

To get the probability, we just divide by 6N.
Let's see how that pans out for the Yahtzee example, which is the N=5 case:

Pattern

ni

Probability

1

2

3

4

5

A A A A A

1

6

/ 7776

A A A A B

1

1

150

/ 7776

A A A B B

1

1

300

/ 7776

A A A B C

2

1

1200

/ 7776

A A B B C

1

2

1800

/ 7776

A A B C D

3

1

3600

/ 7776

A B C D E

5

720

/ 7776

6 + 150 + 300 + 1,200 + 1,800 + 3,600 + 720 = 7,776, so this checks
out.
The table is actually not quite right for Yahtzee, which also
recognizes "large straight" (12345 or 23456) and "small straight"
(1234X, 2345X, or 3456X.) I will continue to disregard this.

The most common Yahtzee throw is one pair, by a large margin. (Any
Yahtzee player could have told you that.) And here's a curiosity: a
full house (AAABB), which scores 25 points, occurs twice as often as
four of a kind (AAAAB), which scores at most 29 points and usually less.

which has a very similar key item. The major difference is that
instead of i!ni
we have ipi. The common term
arises because both formulas are intimately concerned with the
partition structure of the things being counted. I should really go
back and reread the stuff in Concrete Mathematics
about the Stirling numbers of the first kind, which count the number
of partitions of various sizes, but maybe that's a project for next
week.

Anyway, I digress. We can generalize the formula above to work for S-sided dice;
this is a simple matter of replacing the 6 with an S. We
don't even need to recalculate the ni.
And since the key factor of does not involve S, we can easily
precalculate it for some pattern and then plug it into the rest of the
formula to get the likelihood of rolling that pattern with different
kinds of dice. For example, consider the two-pairs pattern AABBC.
This pattern has n1 = 1,
n2 = 2, so the key factor comes out to be 15.
Plugging this into the rest of the formula, we see that the probability of
rolling AABBC with five S-sided dice is
!!90 {S \choose 3} S^{-5}!!.
Here is a tabulation:

# ofsides

Chance ofrolling AABBC

3

37.03704

%

4

35.15625

5

28.80000

6

23.14815

7

18.74219

8

15.38086

9

12.80293

10

10.80000

20

3.20625

50

0.56448

100

0.14553

As S increases, the probability falls off rapidly to zero, as
you would expect, since the chance of rolling even one pair on a set
of million-sided dice is quite slim.

The graph is quite typical, and each pattern has its own favorite kind
of dice. Here's the corresponding graph and table for rolling the
AABBCDEF pattern on eight dice:

# ofsides

Chance ofrolling AABBCDEF

6

9.00206

7

18.35970

8

25.23422

9

29.50469

10

31.75200

11

32.58759

12

32.49180

13

31.80697

14

30.76684

15

29.52744

16

28.19136

17

26.82506

18

25.47084

19

24.15487

20

22.89262

30

13.68370

40

8.85564

50

6.15085

100

1.80238

As you can see, there is a sharp peak around N=11; you are more
likely to roll two pair with eight 11-sided dice than you are with
eight of any other sort of dice. Now if your boss catches you reading
this article at work, you'll be prepared with an unassailable business
justification for your behavior.

Returning to the discussion of poker hands, we might ask what the
ranking of poker hands whould be, on the planet where a poker hand
contains six cards instead of five. Does four of a kind beat three
pair? Using the methods in this article, we can get a quick
approximation. It will be something like this:

Two trips (AAABBB)

Overfull house (AAAABB)

Three pair

Four of a kind

Full house (AAABBC)

Three of a kind

Two pair

One pair

No pair

We'll need to calculate the values for straight and flush separately;
they will be considerably rarer than in five-card poker.

I was going to end the article with tabulations of the number of
different ways to roll each possible pattern, and the probabilities of
getting them, but then I came to my senses. Instead of my running the
program and pasting in the voluminous output, why not just let you run
the program yourself, if you care to see the answers?

There's a line in one of William Gibson's short stories about how
some situations call for a subtle and high-tech approach, and others
call for a sawed-off shotgun. I think my success as a programmer,
insofar as I have any, comes from knowing when to deploy each kind of
approach.

This was generated by a small computer program. I learned a long time
ago that although it it tempting to hack up something like this by
hand, you should usually write a computer program to do it instead.
It takes a little extra time up front, and that time is almost always
amply paid back when you inevitably decide that that table should have
three columns instead of two, or the lines should alternate light and
dark gray, or that you forgot to align the right-hand column on the
decimal points, or whatever, and then all you have to do is change two
lines of code and rerun the program, instead of hand-editing all 34
lines of the output and screwing up two of them and hand-editing them
again. And again. And again.

When I was making up the seating chart for my wedding, I used this
approach. I wrote a raw data file, and then a Perl program to read
the data file and generate LaTeX output. The whole thing was driven by
make. I felt like a bit of an ass as I wrote the program,
wondering if I wasn't indulging in an excessive use of technology, and
whether I was really going to run the program more than once or
twice. How often does the seating chart need to change, anyway?

Gentle readers, that seating chart changed approximately one million and
six times.

The Nth main division of the table at left contains one line for
every partition of the integer N. The right-hand entry in each
line (say 144) is calculated by a function permcount, which
takes the left-hand entry (say [5, 1]) as input. The
permcount function in turn calls upon fact to
calculate factorials and choose to calculate binomial
coefficients.

But how is the left-hand column generated? In my book, I spent quite a lot of
time discussing generation of partitions of an integer, as an example
of iterator techniques. Some of these techniques are very clever and
highly scalable. Which of these clever partition-generating
techniques did I use to generate the left-hand column of the table?

Why, none of them, of course! The left-hand column is hard-wired into
the program:

I guessed that it would take a lot longer to write code to generate
partitions, or even to find it already written and use it, than it
would just to generate the partitions out of my head and type them in.
This guess was correct.
The only thing wrong with my approach is that it doesn't scale. But
it doesn't need to scale.

Cycle classes of permutations
I've always had trouble sleeping. In high school I would pass the
time at night by doing math. Math is a good activity for insomniacs:
It's quiet and doesn't require special equipment.

Today's article is about another entertainment I've been using lately
in meetings: count the number of permutations in each cycle class.

In case you have forgotten, here is a brief summary: a permutation is
a mapping from a set to itself. A cycle of a permutation is a subset
of the set for which the elements fall into a single orbit. For
example, the permutation:

$$
\pmatrix{1&2&3&4&5&6&7&8\cr
1&4&2&8&5&7&6&3\cr}$$

can be represented by the following diagram:

And, since it contains four cycles (the closed loops), it is the product
of the four cycles (1), (2 4 8 3), (5), and (6 7).

We can sort the permutations into cycle classes by saying that
two permutations are in the same cycle class if the lengths of the
cycles are all the same. This effectively files the numeric labels
off the points in the diagrams. So, for example, the permutations of
{1,2,3} fall into the three following cycle classes:

Cycle lengths

Permutations

How many?

1 1 1

()

1

2 1

(1 2)(1 3)(2 3)

3

3

(1 2 3)(1 3 2)

2

Here's the corresponding table for permutations of {1,2,3,4}:

Cycle lengths

Permutations

How many?

1 1 1 1

()

1

2 1 1

(1 2)(1 3)(1 4)
(2 3)(2 4)(1 4)

6

2 2

(1 2)(3 4)(1 3)(2 4)(1 4)(2 3)

3

3 1

(1 2 3)(1 2 4)
(1 3 2)(1 3 4)
(1 4 2)(1 4 3)
(2 3 4)(2 4 3)

8

4

(1 2 3 4)
(1 2 4 3)
(1 3 2 4)
(1 3 4 2)
(1 4 2 3)
(1 4 3 2)

6

Counting up the number of permutations in each cycle class and coming
up with a theorem about it was a good way to kill an hour or two of
meeting time. It has a built-in check, which is that the total counts
of all the cycle classes for permutations of N things had
better add up to N!, or else you know you have made a
mistake.

It is not too hard a problem, and would probably only take
fifteen or twenty minutes outside of a meeting, but this is
exactly what makes it a good problem for meetings, where you can give
the problem only partial and intermittent attention. Now that I have
a simple formula, the enumeration of cycle classes loses all its
entertainment value. That's the way the cookie crumbles.

Here's the formula. Suppose we want to know how many permutations of
{1,...,n} are in the cycle class C.
C is a partition of the number n, which is to say it's a
multiset of positive integers whose sum is n. If C
contains p1 1's, p2 2's, and so
forth, then the number of permutations in cycle class C is:

$$
N(C) = {n! \over {\prod i^{p_i}{p_i}!}}
$$

This can be proved by a fairly simple counting argument, plus a bit of
algebraic tinkering. Note that if
any of the pi is 0, we can disregard it,
since it will contribute a factor of i0·0! =
1 in the denominator.

For example, how many permutations of {1,2,3,4,5} have one 3-cycle and
one 2-cycle? The cycle class is therefore {3,2}, and all the
pi are 0 except for p2 =
p3 = 1. The formula then gives 5! in the numerator
and factors 2 and 3 in the denominator, for a total of 120/6 = 20.
And in fact this is right. (It's equal to !!2{5\choose3}!!: choose three of the five
elements to form the 3-cycle, and then the other two go into the
2-cycle. Then there are two possible orders for the elements of the
3-cycle.)

How many permutations of {1,2,3,4,5} have one 2-cycle and three
1-cycles? Here we have p1 = 3, p2
= 1, and the other pi are 0. Then the
formula gives 120 in the numerator and factors of 6 and
2 in the denominator, for a total of 10.

Here are the breakdowns of the number of partitions in each cycle
class for various n:

1

1

1

2

1 1

1

2

1

3

1 1 1

1

1 2

3

3

2

4

1 1 1 1

1

1 1 2

6

2 2

3

3 1

8

4

6

5

1 1 1 1 1

1

2 1 1 1

10

2 2 1

15

3 1 1

20

3 2

20

4 1

30

5

24

6

1 1 1 1 1 1

1

2 1 1 1 1

15

2 2 1 1

45

2 2 2

15

3 1 1 1

40

3 2 1

120

3 3

40

4 1 1

90

4 2

90

5 1

144

6

120

I find it a bit surprising that the most common cycle structure for
permutations of 6 elements is to have one element map to itself and
the others in one big 5-cycle. But on the other hand, there's a
well-known theorem that the average permutation has exactly one fixed
point, and so perhaps I shouldn't be surprised that the most likely
cycle structure also has exactly one fixed point.

Incidentally, the thing about the average permutation having exactly
one fixed point is quite easy to prove. Consider a permutation of
N things. Each of the N things is left fixed by exactly
(N-1)! of the permutations. So the total number of fixed points in
all the permutations is N!, and we are done.

A similar but slightly more contorted analysis reveals that the
average number of 2-cycles per permutation is 1/2, the average number
of 3-cycles is 1/3, and so forth. Thus the average number of total
cycles per permutation is !!\sum_{i=1}^n{1\over i} = H_n!!. For
example, for n=4, examination of the table above shows that
there is 1 permutation with 4 independent cycles (the identity
permutation), 6 with 3 cycles, 11 with 2 cycles, and 6 with 1 cycle,
for an average of (4+18+22+6)/24 = 50/24 = 1 + 1/2 + 1/3 + 1/4.

Many answers are possible. The point of this note is to refute one
particular common answer, which is that the whole thing is just
meaningless.

This view is espoused by many people who, it seems, ought to know
better. There are two problems with this view.

The first problem is that it involves a theory of meaning that appears to
have nothing whatsoever to do with pragmatics. You can certainly
say that something is meaningless, but that doesn't make it so.
I can claim all I want to that "jqgc ihzu kenwgeihjmbyfvnlufoxvjc sndaye"
is a meaningful utterance, but that does not avail me much, since
nobody can understand it. And conversely, I can say as loudly and as
often as I want to that the utterance "Snow is white" is meaningless,
but that doesn't make it so; the utterance still means that snow is
white, at least to some people in some contexts.

Similarly, asserting that the sentences are meaningless is all very
well, but the evidence is against this assertion. The meaning of the
utterance "sentence 2 is false" seems quite plain, and so does the
meaning of the utterance "sentence 1 is false". A theory of meaning
in which these simple and plain-seeming sentences are actually
meaningless would seem to be at odds with the evidence: People do
believe they understand them, do ascribe meaning to them, and, for
the most part, agree on what the meaning is. Saying that "snow is
white" is meaningless, contrary to the fact that many people agree
that it means that snow is white, is foolish; saying that the example
sentences above are meaningless is similarly foolish.

I have heard people argue that although the sentences are individually
meaningful, they are meaningless in conjunction. This position is
even more problematic. Let us refer to a person who holds this
position as P.
Suppose
sentence 1 is presented to you in isolation. You think you understand
its meaning, and since P agrees that it is meaningful, he
presumably would agree that you do. But then, a week later, someone
presents you with sentence 2; according to P's theory, sentence
1 now becomes meaningless. It was meaningful on February 1, but not
on February 8, even though the speaker and the listener both think it
is meaningful and both have the same idea of what it means. But
according to P, as midnight of February 8, they are suddenly
mistaken.

The second problem with the notion that the sentences are meaningless
comes when you ask what makes them meaningless, and how one can
distinguish meaningful sentences from sentences like these that are
apparently meaningful but (according to the theory) actually
meaningless.

The answer is usually something along the lines that sentences that
contain self-reference are meaningless. This answer is totally
inadequate, as has been demonstrated many times by many people,
notably W.V.O. Quine. In the example above, the self-reference
objection is refuted simply by observing that neither sentence is
self-referent. One might try to construct an argument about
reference loops, or something of the sort, but none of this will
avail, because of Quine's example:

"is false when appended to a quoted version of itself."
is false when appended to a quoted version of itself.
This is a perfectly well-formed, grammatical sentence (of the form
"x is false when appended to a quoted version of itself".) It
is not immediately self-referent, and there is no "reference loop"; it
merely describes the result of a certain operation. In this way, it
is analogous to sentences like this one:

"snow is white" is false when you change "is" to "is not".

Or similarly:

If a sentence is false, then its negation is true.

Nevertheless, Quine's sentence is an antinomy of the same sort as the
example sentences at the top of the article.

But all of this is peripheral to the main problem with the argument
that sentences that contain self-reference are meaningless. The main
problem with this argument is that it cannot be true. The
sentence "sentences that contain self-reference are meaningless" is
itself a sentence, and therefore refers to itself, and is therefore
meaningless under its own theory. If the assertion is true, then the
sentence asserting it is meaningless under the assertion itself; the
theory deconstructs itself. So anyone espousing this theory has
clearly not thought through the consequences. (Graham Priest says that
people advancing this theory are subject to a devastating ad
hominem attack. He doesn't give it specifically, but many such
come to mind.)

In fact, the self-reference-implies-meaninglessness theory obliterates
not only itself, but almost all useful statements of logic. Consider
for example "The negation of a true sentence is false and the negation
of a false sentence is true." This sentence, or a variation of it, is
probably found in every logic textbook ever written. Such a sentence
refers to itself, and so, in the
self-reference-implies-meaninglessness theory, is meaningless. So too
with most of the other substantive assertions of our logic textbooks,
which are principally composed of such self-referent sentences about
properties of sentences; so much for logic.

The problems with ascribing meaninglessness to self-referent sentences
run deeper still. If a sentence is meaningless, it cannot be
self-referent, because, being meaningless, it cannot refer to anything
at all. Is "jqgc ihzu kenwgeihjmbyfvnlufoxvjc sndaye" self-referent?
No, because it is meaningless. In order to conclude that it was
self-referent, we would have to understand it well enough to ascribe a
meaning to it, and this would prove that it was meaningful.

So the position that the example sentences 1 and 2 are "meaningless"
has no logical or pragmatic validity at all; it is totally
indefensible. It is the philosophical equivalent of putting one's
fingers in one's ears and shouting "LA LA LA I CAN'T HEAR YOU!"

Fondue
Lorrie and I had fondue for dinner two nights ago. To make cheese
fondue, you melt a lot of Swiss cheese into a cup of dry white wine,
then serve hot and dunk chunks of bread into the melted cheese with
long forks.

Lorrie was in charge of buying the ingredients. I did not read the
label on the wine before I opened and tasted it, and so was startled
to discover that it was a Riesling, which is very much not a dry wine,
as is traditional. Riesling is is a very sweet and fruity wine.

I asked Lorrie how she chose the wine, and she said she had gotten
Riesling because she prefers sweet wines. I remarked that dry wines
are traditional for fondue. But it was what we had, and I made the
fondue with it. Anyway, as Lorrie pointed out, fondue is often
flavored with a dash of kirsch, which is a cherry liqueur, and not at
all dry. I never have kirsch in the house, and usually use port or
sherry instead. Since we were using Riesling, I left that stuff
out.

The fondue was really outstanding, easily the most delicious fondue
I've ever made. Using Riesling totally changed the character of the
dish. The Riesling gave it a very rich and complex flavor. I'm going
to use Riesling in the future too. Give it a try.

Recipe

Rub the inside of a heavy saucepan with a cut garlic clove. Heat 1
cup Riesling over medium heat in the saucepan. When the surface of
the wine is covered with fine bubbles, add 1 tablespoon corn starch
and stir until dissolved. Reduce heat and slowly add 3/4 lb grated
emmenthaler and 3/4 lb grated gruyere cheeses, stirring constantly
until completely melted.

Transfer to a caquelon (fondue pot) and serve with chunks of crusty
French bread and crisp apples.

Mnemonics
A while back I recounted the joke about the plover's egg: A teenage
girl, upon hearing that the human testicle is the size of a plover's
egg, remarks "Oh, so that's how big a plover's egg is." I believe this
was considered risqué in 1974, when it was current. But today
I was reminded of it in a rather different context.

The
Wikipedia article about the number e mentions a very silly
mnemonic for remembing the digits of e: "2.7-Andrew
Jackson-Andrew Jackson-Isosceles Right Triangle". Apparently, Andrew
Jackson was elected President in 1828. When I saw this, my immediate
thought was "that's great; from now on I'll always remember when
Andrew Jackson was elected President."

In high school, I had a math teacher who pointed out that a mnemonic
for the numerical value of √3 was to recall that George
Washington was born in the year 1732. And indeed, since that day I
have never forgotten that Washington was born in 1732.

Software archaeology
For appropriate values of "everyone", everyone knows that Unix files
do not record any sort of "creation time". A fairly frequently asked
question in Unix programming forums, and other related forums, such as
Perl programming forums, is how to get the creation date of a file;
the answer is that you cannot do that because it is not there.

This lack is exacerbated by several unfortunate facts: creation times
are available on Windows systems; the Unix inode contains three
timestamps, one of which is called the "ctime", and the "c" is
suggestive of the wrong thing; Perl's built-in stat function
overloads the return value to return the Windows creation time in the
same position (on Windows) as it returns the ctime (on Unix).

So we see questions like this one, which appeared this week on the
Philadelphia Linux Users' Group mailing list:

How does one check and change ctime?

And when questioned as to why he or she wanted to do this, this person
replied:

We are looking to change the creation time. From what I understand,
ctime is the closest thing to creation time.

There is something about this reply that irritates me, but I'm not
quite sure what it is. Several responses come to mind: "Close" is
not sufficient in system programming; the ctime is not "close" to a
creation time, in any sense; before you go trying to change the thing,
you ought to do a minimal amount of research to find out what it is.
It is a perfect example of the Wrong Question, on the same order as
that poor slob all those years ago who wanted to know how to tell if a
file was a hard link or a soft link.

But anyway, that got me thinking about ctimes in general, and I did
some research into the history and semantics of the thing, and made
some rather surprising discoveries.

IV. IMPLEMENTATION OF THE FILE SYSTEM

... The entry found thereby (the file's i-node) contains the
description of the file:

...
time of creation, last use, and last modification

An error? I don't think so. Here is corroborating evidence, the
stat man page from the first edition of Unix, from 1971:

NAME stat -- get file status
SYNOPSIS sys stat; name; buf / stat = 18.
DESCRIPTION name points to a null-terminated string naming a file; buf is the
address of a 34(10) byte buffer into which information is placed
concerning the file. It is unnecessary to have any permissions at all
with respect to the file, but all directories leading to the file
must be readable.
After stat, buf has the following format:
buf, +1 i-number
+2, +3 flags (see below)
+4 number of links
+5 user ID of owner size in bytes
+6,+7 size in bytes
+8,+9 first indirect block or contents block
...
+22,+23 eighth indirect block or contents block
+24,+25,+26,+27 creation time
+28,+29, +30,+31 modification time
+32,+33 unused

When did the ctime change from being called a "creation time" to a
"change time"? Did the semantics change too, or was the "creation
time" description a misnomer? If I can't find out, I might write to
Ritchie to ask. But this is, of course, a last resort.

In the meantime, I do have the source code for the fifth edition
kernel, but it appears that, around that time (1975 or so), there was
no creation time. At least, I can't find one.

The inode operations inside the kernel are defined to operate on struct
inodes:

The i_lastr field is what we would now call the atime. (I
suppose it stands for "last read".) The mtime and ctime are not
there, because they are not stored in the in-memory copy of the inode.
They are fetched directly from the disk when needed.

We can see an example of this in the stat1 function, which is
the backend for the stat and fstat system calls:

ub is the user buffer into which the stat data will be
deposited. ip is the inode structure from which most of
this data will be copied. The
suword utility copies a two-byte unsigned integer ("short
unsigned word") from source to destination. This is done starting at
the i_dev field (line 9), which effectively skips the two
earlier fields, i_flag and i_count, which are
internal kernel matters that are none of the user's business.

14 words are copied from the inode structure starting from this
position, including the device and i-number fields, the mode, the link
count, and so on, up through the addresses of the data or indirect
blocks. (In modern Unixes, the stat call omits these addresses.)
Then four words are copied out of the cp buffer, which has been
read from the inode actually on the disk; these eight bytes are at
position 24 in the inode, and ought to contain the mtime and the
ctime. The question is, which is which? This simple question turns
out to have a surprisingly complicated answer.

When an inode is modified, the IUPD flag is
set in the i_flag member. For example, here is
chmod, which modifies the inode but not the underlying data.
On a modern unix system, we would expect this to update the ctime, but
not the mtime. Let's see what it does in version 5:

Line 10 is the important one; it sets the mode on the in-memory copy
of the inode to the argument supplied by the user. Then line 11 sets
the IUPD flag to indicate that the inode has been modified.
Line 12 calls iput, whose principal job is to maintain the
kernel's internal reference count of the number of file descriptors
that are attached to this inode. When this number reaches zero, the
inode is written back to disk, and discarded from the kernel's open
file table. The iupdat function, called from iput,
is the one that actually writes the modified inode back to the
disk:

What is going on here? p is the in-memory copy of the inode
we want to update. It is immediately copied into a register, and
called by the alias rp thereafter. tm is the time
that the kernel should write into the mtime field of the inode.
Usually this is the current time, but the smdate system call
("set modified date") supplies it from the user instead.

Lines 16–17 copy the mode, link count, uid, gid, "size", and
"addr" fields from the in-memory copy of the inode into the block
buffer that will be written back to the disk. Lines 18–22
update the atime if the IACC flag is set, or skip it if not.
Then, if the IUPD flag is set, lines 24–25 write the
tm value into the next slot in the buffer, where the mtime is
stored. The bwrite call on line 27 commits the data to the
disk; this results in a call into the appropriate device driver
code.

There is no sign of updating the ctime field, but recall that we
started this search by looking at what the chmod call does;
it sets IUPD, which eventually results in the updating of the
mtime field. So the mtime field is not really an mtime field as we
now know it; it is doing the job that is now done by the ctime field.
And in fact, the dump command predicates its decision about
whether to dump a file on the contents of the mtime field. Which is
really the ctime field. So functionally, dump is doing the
same thing it does now.

It's possible that I missed it, but I cannot find the advertised
creation time anywhere. The logical place to look is in the
maknode function, which allocates new inodes. The
maknode function calls ialloc to get an unused inode
from the device, and this initializes its mode (as specified by the
user), its link count (to 1), and its uid and gid (to the current
process's uid and gid). It does not set a creation time. The
ialloc function is fairly complicated, but as far as I can
tell it is not setting any creation time either.

Working it from the other end, asking who might look at the
ctime field, we have the find command, which has a
-mtime option, but no -ctime option. The
dump command, as noted before, uses the mtime. Several
commands perform stat calls and declare structs to hold the
result. For example, pr, which prints files with nice
pagination, declares a struct inode, which is the inode as
returned by stat, as opposed to the inode as used internally
by the kernel—what we would call a struct stat now.
There was no /usr/include in the fifth edition, so the
pr command contains its own declaration of the struct
inode. It looks like this:

struct inode {
int dev;
...
int atime[2];
int mtime[2];
};

No sign of the ctime, which would have been after the mtime
field. (Of course, it could be there anyway, unmentioned in the
declaration, since it is last.) And similarly, the ls command
has:

A couple of commands have extremely misleading declarations. Here's
the struct inode from the prof command, which prints
profiling reports:

struct inode {
int idev;
...
int ctime[2];
int mtime[2];
int fill;
};

The atime field has erroneously been called ctime here, but
it seems that since prof does not use the atime, nobody
noticed the bug. And there's a mystery fill field at the
end, as if prof is expecting one more field, but doesn't know
what it will be for. The declaration of ibuf in the
ln command has similar oddities.

So the creation time advertised by the CACM paper (1974)
and the version 1 manual (1971) seems to have disappeared by the time
of version 5 (1975), if indeed it ever existed.

But there was some schizophrenia in the version 5 system about whether
there was a third date in addition to the atime and the mtime. The
stat call copied it into the stat buffer, and some commands
assumed that it would be there, although they weren't sure what it
would be for, and none of them seem look at it. It's quite possible
that there was at one time a creation date, which had been eliminated
by the time of the fifth edition, leaving behind the vestigial remains
we saw in commands like ln and prof and in the code
of the stat1 function.

Functionally, the version 5 mtime is actually what
we would now call the ctime: it is updated by operations like
chmod that in modern Unix will update the ctime but not the
mtime. A quick scan of the Lions Book suggests that it was the same
way in version 6 as well. I imagine that the ctime-mtime distinction
arose in version 7, because that was the last version before the
BSD/AT&T fork, and nearly everything common to those two great
branches of the Unix tree was in version 7.

Oh, what the hell; I have the version 7 source code; I may as well
look at it. Yes, by this time the /usr/include/sys/stat.h
file had been invented, and does indeed include all three times in the
struct stat. So the mtime (as we now know it) appears to
have been introduced in v7.

One sometimes hears that early Unix had atime and mtime, and that
ctime was introduced later. But actually, it appears that early Unix
had atime and ctime, and it was the mtime that was introduced later.
The confusion arises because in those days the ctime was called
"mtime".

Addendum: It occurs to me now that the version 5 mtime is not
precisely like the modern ctime, because it can be set via the
smdate call, which is analogous to the modern utime
call. The modern ctime cannot be set at all.

(Minor trivium: line 22 of iupdat is ip1 =+ 2. In
modern C, we would write ip1 += 2. The =+ and
=- operators had turned out to be a mistake, because people
would write i=-1, intending i = -1, but the compiler
would understand it as i =- 1, producing subtle bugs. The
spellings of the operators were changed to avoid these bugs. The
change from =+ to += was complete by the time
K&R first edition was published in 1978: K&R mentions the
old-style operators and says that the are obsolete. In spite
of this, the Sun compiler I used in 1987 would still produce a warning
for i=-1, despite interpreting it as i = -1. I
believe this was because it was PCC-derived, and all PCC compilers
emitted this warning.
In the fifth edition code, we can see the obsolete form still in use.)

(Totally peripheral addendum: Google search for
dmr puts Dennis M. Ritchie in fourth position, not
the first. Is this grave insult to our community to be tolerated? I
think not! It must be avenged! With fire and steel!)

[ Addendum 20070127: Unix source code prior to the fifth edition is
lost. The manuals for the third and fourth editions are
available from the Unix Heritage
Society. The manual for the third edition (February 1973) mentions the
creation time, but by the fourth edition (November 1973) the
stat(2) man page no longer mentions a creation time. In
v4, the two dates in the stat structure are called actime
(modern atime) and modtime (modern mtime/ctime). ]

Environmental manipulations
Unix is full of little utility programs that run some other program in
a slightly modified environment. For example, the nohup
command:

SYNOPSIS

nohup COMMAND [ARG]...

DESCRIPTION

Run COMMAND, ignoring hangup signals.

The nohup basically does signal(NOHUP, SIG_IGN)
before calling execvp(COMMAND, ARGV) to execute the
command.

Similarly, there is a chroot command, run as chroot
new-root-directorycommandargs..., which
runs the specified command with its default root inode set to
somewhere else. And there is a nice command, run as nice
nice-value-adjustmentcommandargs..., which
runs the specified command with its "nice" value changed. And there
is an env environment-settingscommandargs... which runs the specified command with new
variables installed into the environment. The standard sudo
command could also be considered to be of this type.

I have also found it useful to write trivial commands called
indir, which runs a command after chdir-ing to a new
directory, and stopafter, which runs a command after setting
the alarm timer to a specified amount, and, just today,
with-umask, which runs a command after setting the umask to a
particular value.

I could probably have avoided indir and with-umask.
Instead of indir DIR COMMAND, I could use sh -c 'cd DIR;
exec COMMAND', for example. But indir avoids an extra
layer of horrible shell quotes, which can be convenient.

Today it occurred to me to wonder if this proliferation of commands
was really the best way to solve the problem. The sh -c
'...' method solves it partly, for those parts of the process
user area to which correspond shell builtin commands. This includes
the working directory, umask, and environment variables, but not the
signal table, the alarm timer, or the root directory.

There is no standardized interface to all of these things at any
level. At the system call level, the working directory is changed by
the chdir system call, the root directory by chroot,
the alarm timer by alarm, the signal table by a bunch of
OS-dependent nonsense like signal or sigaction, the
nice value by setpriority, environment variables by a
potentially complex bunch of memory manipulation and pointer banging,
and so on.

Since there's no single interface for controlling all these things, we
might get a win by making an abstraction layer for dealing with them.
One place to put this abstraction layer is at the system level, and
might look something like this:

This has several drawbacks. One is that it requires kernel hacking.
A subitem of this is that it will never become widespread, and that if
you can't (or don't want to) replace your kernel, it cannot be made to
work for you. Another is that it does not work for the environment
variables, which are not really administered by the kernel. Another
is that it does not fully solve the original problem, which is to
obviate the plethora of nice, nohup, sudo,
and env commands. You would still have to write a command
to replace them. I had thought of another drawback, but forgot it while I
was writing the last two sentences.

You can also put the abstraction layer at the C library level. This
has fewer drawbacks. It no longer requires kernel hacking, and can
provide a method for modifying the environment. But you still need to
write the command that uses the library.

We may as well put the abstraction layer at the Unix command level.
This means writing a command in some language, like Perl or C, which
offers a shell-level interface to manipulating the process
environment, perhaps something like this:

One question to ask is whether something like this already exists.
Another is, if not, whether it's because there's some reason why it's
a bad idea, or because there's a simpler solution, or just because
nobody has done it yet.

Length of baseball gamesIn an earlier
article, I asserted that the average length of a baseball game was
very close to 9 innings. This is a good rule of thumb, but it is also
something of a coincidence, and might not be true in every year.

The canonical game, of course, lasts 9 innings. However, if the score
is tied at the end of 9 innings, the game can, and often does, run
longer, because the game is extended to the end of the first complete
inning in which one team is ahead. So some games run longer than 9
innings: games of 10 and 11 innings are quite common, and the
major-league record is 25.

Counterbalancing this effect, however, are two factors. Most important is
that when the home team is ahead after the first half of the ninth
inning, the second half is not played, since it would be a waste of
time. So nearly half of all games are only 8 1/2 innings long. This
depresses the average considerably. Together with the games that are
stopped early on account of rain or other environmental conditions,
the contribution from the extra-inning tie games is almost exactly
cancelled out, and the average ends up close to 9.

In need of some bathroom reading last week, I grabbed my paperback
copy of Thomas Hobbes' Leviathan, which is always a fun read.
The thing that always strikes me about Leviathan is that almost
every sentence makes me nod my head and mutter "that is
so true," and then want to get in an argument with someone in
which I have the opportunity to quote that sentence to refute them.
That may sound like a lot to do on every sentence, but the sentences
in Leviathan are really long.

Here's a random example:

And as in arithmetic unpractised men must, and professors themselves
may often, err, and cast up false; so also in any other subject of
reasoning, the ablest, most attentive, and most practised men may
deceive themselves, and infer false conclusions; not but that reason
itself is always right reason, as well as arithmetic is a certain and
infallible art: but no one man's reason, nor the reason of any one
number of men, makes the certainty; no more than an account is
therefore well cast up because a great many men have unanimously
approved it. And therefore, as when there is a controversy in an
account, the parties must by their own accord set up for right reason
the reason of some arbitrator, or judge, to whose sentence they will
both stand, or their controversy must either come to blows, or be
undecided, for want of a right reason constituted by Nature; so is it
also in all debates of what kind soever: and when men that think
themselves wiser than all others clamour and demand right reason for
judge, yet seek no more but that things should be determined by no
other men's reason but their own, it is as intolerable in the society
of men, as it is in play after trump is turned to use for trump on
every occasion that suit whereof they have most in their hand. For
they do nothing else, that will have every of their passions, as it
comes to bear sway in them, to be taken for right reason, and that in
their own controversies: bewraying their want of right reason by the
claim they lay to it.

Gosh, that is so true. Leviathan is of course
available online at many locations; here
is one such.

Anyway, somewhere in the process of all this I learned that Hobbes had
some mathematical works, and spent a little time hunting them down.
The Penn library has links to online versions of some, so I got to
read a little with hardly any investment of effort. One that
particularly grabbed my attention was "Three papers presented to the
Royal Society against Dr. Wallis".

Wallis was a noted mathematician of the 17th century, a contemporary
of Isaac Newton, and a contributor to the early development of the
calculus. These days he is probably best known for the remarkable
formula:

So I was reading this Hobbes argument against Wallis, and I hardly got
through the first page, because it was so astounding. I will let
Hobbes speak for himself:

The Theoreme.

The four sides of a Square, being divided into any
number of equal parts, for example into 10; and straight lines being
drawn through opposite points, which will divide the Square into 100
lesser Squares; The received Opinion, and which Dr. Wallis commonly
useth, is, that the root of those 100, namely 10, is the side of the
whole Square.

The Confutation.

The Root 10 is a number of those Squares,
whereof the whole containeth 100, whereof one Square is an Unitie;
therefore, the Root 10, is 10 Squares: Therefore the root of 100 Squares is 10 Squares, and
not the side of any Square; because the side of a Square is
not a Superfices, but a Line.

Hobbes says, in short, that the square root of 100 squares is not 10
unit lengths, but 10 squares. That is his whole argument.

Hobbes, of course, is totally wrong here. He's so totally wrong that
it might seem hard to believe that he even put such a totally wrong
notion into print. One wants to imagine that maybe we have
misunderstood Hobbes here, that he meant something other than what he
said. But no, he is perfectly lucid as always. That is a drawback of
being such an extremely clear writer: when you screw up, you cannot
hide in obscurity.

I picture the members of the Royal Society squirming in their seats as
Hobbes presents this "confutation" of Wallis. There is a reason why
John Wallis is a noted mathematician of the 17th century, and Hobbes
is not a noted mathematician at all. Oh well!

Wallis presented a rebuttal sometime later, which I was not going to
mention, since I think everyone will agree that Hobbes is totally
wrong. But it was such a cogent rebuttal that I wanted to quote a bit
from it:

Like as 10 dozen is the root, not of 100 dozen, but of 100 dozen
dozen. ... But, says he, the root of 100 soldiers, is 10 soldiers.
Answer: No such matter, for 100 soldiers is not the product of 10
soldiers into 10 soldiers, but of 10 soldiers into the number 10: And
therefore neither 10, nor 10 soldiers, is the root of it.

In the course of preparing my sample curve diagrams, one of which is
shown at right, I ran into several related bugs in the way that arrays
were being handled. What I really wanted to do was to define a
labeled_curve object, something like this:

That is, it is just like an ordinary curve, except that it also has a
"spot" at each control point. A "spot" is a graphic element that
marks the control point, probably with a small circle or something of
the sort:

A spot is like a point, and so it has an x
and a y coordinate. But it also has a circle, circ,
which is centered at this location. (circ.c is the center of
the circle.)

When I first tried this, it didn't work because linogram
didn't understand that a labeled_curve with N = 4
control points would also have four instances of circ, four
of circ.c, four of circ.c.x, and so on. It did
understand that the labeled curve would have four instances of
s, but the multiplicity wasn't being propagated to the
subobjects of s.

I fixed this up in pretty short order.

But the same bug persisted for circ.r, and this is not
so easy to fix. The difference is that while circ.c is a
full subobject, subject to equation solving, and expected to be
unknown, circ.r is a parameter, which much be specified in
advance.

N, the number of spots and control points, is another such
parameter. So there's a first pass through the object hierarchy to
collect the parameters, and then a later pass figures out the
subobjects. You can't figure out the subobjects without the
parameters, because until you know the value of parameters like
N, you don't know how many subobjects there are in arrays like
s[N].

For subobjects like S[N].circ.c.x, there is no issue. The
program gathers up the parameters, including N, and then
figures out the subobjects, including S[0].circ.c.x and so
on. But S[0].circ.r, is a parameter, and I can't say that
its value will be postponed until after the values of the parameters
are collected. I need to know the values of the parameters before I
can figure out what the parameters are.

This is not a show-stopper. I can think of at least three ways
forward. For example, the program could do a separate pass for
param index parameters, resolving those first. Or I could do
a more sophisticated dependency analysis on the parameter values; a
lot of the code for this is already around, to handle things like
param number a = b*2, b=4, c=a+b+3, d=c*5+b. But I need to
mull over the right way to proceed.

Here the program needs to know the value of N in order to
decide how many snarks are in a boojum. But the number N itself is
determined by examining the p parameter in snark 2, which
itself will not exist if N is less than 3. Should this sort of
nonsense be allowed? I'm not sure yet.

When you invent a new kind of program, there is an interesting
tradeoff between what you want to allow, what you actually do allow,
and what you know how to implement. I definitely want to allow the
labeled_curve thing. But I'm quite willing to let the
snark-boojum example turn into some sort of run-time failure.

Recent Linogram development update
Lately most of my spare time (and some not-spare time) has been going
to linogram. I've been posting updates pretty regularly at
the main linogram page. But I don't know if
anyone ever looks at that page. That got me thinking that it was not
convenient to use, even for people who are interested in
linogram, and that maybe I should have an RSS/Atom feed for
that page so that people who are interested do not have to keep
checking back.

Then I said "duh", because I already have a syndication feed for this page, so why not just
post the stuff here?

So that is what I will do. I am about to copy a bunch of stuff from
that page to this one, backdating it to match when I posted it.

Another Linogram success story
I've been saying for a while that a well-designed system surprises
even the designer with its power and expressiveness. Every time
linogram surprises me, I feel a flush of satisfaction because
this is evidence that I designed it well. I'm beginning to think that
linogram may be the best single piece of design I've
ever done.

Here was today's surprise. For a long time, my demo diagram has been
a rough rendering of one of the figures from Higher-Order
Perl:

(It's big, so this is a reduced version; click to expand it.)

I wanted component k in the middle of the diagram to be a
curved line, but since I didn't have curved lines yet, I used two
straight lines instead, as shown below:

As of today, I have working curves, so I went to replace k with
a curved line instead. I went into the demo.lino file, which
I wrote a couple of years ago, to see
what changes would be required. The definition of k was much
more complicated than I remembered. Here is the relevant extract:

So I had defined a thing called a bentline, which is a line
with a slight angle in it. Or more precisely, it's two
approximately-vertical lines joined end-to-end. It has three
important reference points: start, which is the top point,
end, the bottom point, which is directly under the top
point, and center, halfway in between, but displaced
leftward by depth.

I now needed to replace this with a curved line. This meant
removing all the references to start, end,
upper and so forth, since curves don't have any of those
things. A significant rewrite, in other words.

But then I had a happy thought. I added the following definition to
the file:

A bentline_curved is now the same as a bentline, but
with an extra curved line, called c, which has three control
points, defined to be identical with start, center,
and end. These three points inherit all the same constraints
as before, and so are constrained in the same way and positioned in
the same way. But instead of drawing the two lines, the
bentline_curved draws only the curve.

I then replaced:

bentline k;

with:

bentline_curved k;

and recompiled the diagram. The result is below:

This diagram is identical, except that arc k has changed from a
bent line to a curve. Compare:

To make this change, I didn't have to edit or understand the
definition of bentline, except to understand a bit about its
interface: begin, end, and center. I could
build a new definition atop it that allowed the rest of the program to
use it in exactly the same way, although it was drawn in a completely
different way.

I didn't foresee this when I designed the linogram language.
Sometimes when you try a new kind of program for the first time, you
keep getting unpleasant surprises. You find things you realize you
didn't think through, or that have unexpected consequences, or
features that turn out not to be as powerful as you need, or that mesh
badly with other features. Then you have to go back and revisit your
design, fix problems, try to patch up mismatches, and so forth. In
contrast, the appearance of the sort of pleasant surprise like the one
in this article is exactly the opposite sort of situation, and makes
me really happy.

Linogram development: 20070120 Update
The array feature is working, pending some bug fixes. I have not yet
found all the bugs, I think. But the feature has definitely moved
from the does-not-work-at-all phase into the mostly-works phase. That
is, I am spending most of my time tracking down bugs, rather than
writing large amount of code. The test suite is expanding rapidly.

The regular polygons are working pretty well, and the curves are
working pretty well. Here are some simple examples:

One interesting design problem turned up that I had not foreseen. I
had planned for the curve object to be specified by 2 or more
control points. (The control points are marked by little circles in
the demo pictures above.) The first and last controlpoints would be
endpoints, and the curve would start at point 0, then head toward
point 1, veer off toward point 2, then veer off toward point 3, etc.,
until it finally ended at point N. You can see this in the
pictures.

This is like the behavior of pic, which has good-looking
curves. You don't want to require that the curve pass through
all the control points, because that does not give it enough freedom
to be curvy. And this behavior is easy to get just by using a
degree-N Bézier curve, which was what I planned to
do.

However, PostScript surprised me. I had thought that it had
degree-N Bézier curves, but it does not. It has only
degree-3 ("cubic") Bézier curves. So then I was left with the
puzzle of how to use PostScript's Bézier curves to get what I
wanted. Or should I just change the definition of curve in
linogram to be more like what PostScript wanted? Well, I
didn't want to do that, because linogram is supposed to be
generic, not a front-end to PostScript. Or, at least, not a front-end
only to PostScript.

I did figure out a compromise. The curves generated by the PostScript
drawer are made of PostScript's piecewise-cubic curves, but, as you
can see from the demo pictures, they still have the behavior I want.
The four control points in the small demos above actually turn into
two PostScript cubic Bézier curves, with a total of seven control points.
If you give linogram the points
A,
B,
C, and
D, the PostScript engine draws two cubic Bézier curves, with control
points
{A,
B,
B,
(B + C)/2} and
{(B + C)/2,
C,
C,
D}, respectively. Maybe I'll write a blog article about why I
chose to do it this way.

One drawback of this approach is that the curves turn rather sharply
near the control points. I may tinker with the formula later to
smooth out the curves a bit, but I think for now this part is good
enough for beta testing.

This defines two polygons, t1 and
t2, each with three sides. The three vertices of
t1 are specified explicitly. Triangle
t2 is the same, but with the vertices numbered
differently: t2.v0 =
t1.v2,
t2.v1 =
t1.v0, and
t2.v2 =
t1.v1. Each of the triangles also
has three edges, defined implicitly by the definition in
polygon.lino:

All together, there are 38 values here: 2 coordinates for each of
three vertices of each of the two triangles makes 12; 2 coordinates
for each of two endpoints of each of three edges of each of the two
triangles is another 24, and the two N values themselves
makes a total of 12 + 24 + 2 = 38.

All of the equations are rather trivial. All the difficulty is in
generating the equations in the first place. The program must
recognize that the variable i in the polygon
definition is a dummy iterator variable, and that it is associatated
with the parameter N in the polygon definition. It must
propagate the specification of N to the right place, and then
iterate the equations appropriately, producing something like:

e0.end = v0+1e1.end = v1+1e2.end = v2+1

Then it must fold the constants in the subscripts and apply the
appropriate overflow semantics—in this case, 2+1=0.

Open figures still don't work properly. I don't think this will take
too long to fix.

The code is very messy. For example, all the Type classes
are in a file whose name is not Type.pm but
Chunk.pm. I plan to have a round of cleanup and consolidation
after the 2.0 release, which I hope will be soon.

R3 is not a square
I haven't done a math article for a while. The most recent math
things I read were some papers on the following theorem: Obviously,
there is a topological space X such that X3 =
R3, namely, X = R. But is there a space
X such that X2 = R3? ("=" here denotes
topological homeomorphism.)

It would be rather surprising if there were, since you could then
describe any point in space unambiguously by giving its two
coordinates from X. This would mean that in some sense, R3
could be thought of as two-dimensional. You would expect that any
such X, if it existed at all, would have to be extremely
peculiar.

I had been wondering about this rather idly for many years, but last
week a gentleman on IRC mentioned to me that there had been a proof in
the American Mathematical Monthly a couple of years back that
there was in fact no such X. So I went and looked it up.

The paper was "Another Proof That R3 Has No Square Root", Sam
B. Nadler, Jr., American Mathematical Monthly vol 111
June–July 2004, pp. 527–528. The proof there is
straightforward enough, analyzing the topological dimension of
X and arriving at a contradiction.

But the Nadler paper referenced an earlier paper which has a much
better proof. The proof in "R3 Has No Root", Robbert Fokkink,
American Mathematical Monthly vol 109 March 2002, p. 285, is
shorter, simpler, and more general. Here it is.

A linear map Rn → Rn can be understood to preserve or reverse
orientation, depending on whether its determinant is +1 or -1. This
notion of orientation can be generalized to arbitrary homeomorphisms,
giving a "degree" deg(m) for every homeomorphism which is +1 if
it is orientation-preserving and -1 if it is orientation-reversing.
The generalization has all the properties that one would hope for. In
particular, it coincides with the corresponding notions for linear
maps and differentiable maps, and it is multiplicative: deg(fog) = deg(f)·deg(g) for all homeomorphisms
f and g. In particular ("fact 1"), if h is any
homeomorphism whatever, then hoh is an
orientation-preserving map.

Now, suppose that h : X2 → R3 is a homeomorphism.
Then X4 is homeomorphic to R6, and we can view
quadruples (a,b,c,d) of elements of
X as equivalent to sextuples
(p,q,r,s,t,u)
of elements of R.

Consider the map s on X4 which takes
(a,b,c,d) →
(d,a,b,c). Then s o s is the map (a,b,c,d)
→ (c,d,a,b). By fact 1 above, s o s must be an
orientation-preserving map.

But translated to the putatively homeomorphic space R6, the
map (a,b,c,d) →
(c,d,a,b) is just the linear map on
R6 that takes
(p,q,r,s,t,u) →
(s,t,u,p,q,r). This map is
orientation-reversing, because its determinant is -1. This is a
contradiction. So X4 must not be homeomorphic to R6,
and X2 therefore not homeomorphic to R3.

The same proof goes through just fine to show that
R2n+1 = X2 is false for all n,
and similarly for open subsets of R2n+1.

The paper also refers to an earlier paper ("The cartesian product of a
certain nonmanifold and a line is E4", R.H. Bing,
Annals of Mathematics series 2 vol 70 1959
pp. 399–412) which constructs an extremely pathological
space B, called the "dogbone space", not even a manifold, which
nevertheless has B × R3 = R4. This is on my
desk, but I have not read this yet, and I may never.

State of the Blog 2006
This is the end of the first year of my blog. The dates on the early
articles say that I posted a few in 2005, but they are deceptive. I
didn't want a blog with only one post in it, so I posted a bunch of
stuff that I had already written, and backdated it to the dates on
which I had written it. The blog first appeared on 8 January, 2006,
and this was the date on which I wrote its first articles.

Output

Not counting this article, I posted 161 articles this year, totalling
about 172,000 words, which I think is not a bad output. About 1/4 of
this output was about mathematics.

I drew, generated, or appropriated about 300 pictures, diagrams, and
other illustrations, plus 66 mathematical formulas. This does not
count the 50 pictures of books that I included, but it does include
108 little colored squares for the article on the Pólya-Burnside
counting lemma.

Financial

I incurred the costs of Dreamhosting (see below). But these costs are
offset because I am also using the DreamHost as a remote backup for
files. So it has some non-blog value, and will also result in a tax
deduction.

None of the book links earned me any money from kickbacks. However,
the blog did generate some income. When Aaron Swartz struck oil, he
offered to give away money to web sites that needed it. Mine didn't
need it, but a little later he published a list of web sites he'd
given money to, and I decided that I was at least as deserving as some
of them. So I stuck a "donate" button on my blog and invited Aaron to
use it. He did. Thanks, Aaron!

I now invite you to use that button yourself. Here are two versions
that both do the same thing:

I could not decide whether to go with the cute and pathetic begging
approach (shown right) or the brusque and crass demanding approach
(left).

My MacArthur Fellowship check has apparently been held up in the
mail.

Popularity

The most popular article was certainly the one on Design Patterns of 1972.
I had been thinking this one over for years, and I was glad it
attracted as much attention as it did. Ralph Johnson (author of the
Design Patterns book) responded to it, and I learned that
Design Patterns is not the book that Johnson thought it
was. Gosh, I'm glad I didn't write that book.

System administration

I moved the blog twice. It originally resided on
www.plover.com, which is in my house. I had serious network
problems in July and August, Verizon's little annual gift to me. When
I realized that the blog was be much more popular than I expected, and
that I wanted it to be reliably available, I moved it to
newbabe.pobox.com, which I'd had an account on for years but
had never really used. This account was withdrawn a few months later,
so I rented space at Dreamhost, called blog.plover.com, and
moved it there. I expect it will stay at Dreamhost for quite a
while.

Moving the blog has probably cost me a lot of readers. I know from
the logs that many of them have not moved from newbabe to Dreamhost.
Traffic on the new site just after the move was about 25% lower than
on the old site just before the move. Oh well.

If the blog hadn't moved so many times, it would be listed by
Technorati as one of the top ten blogs on math and science, and one of
the top few thousand overall. As it is, the incoming links (which are
what Technorati uses to judge blog importance) are scattered across
three different sites, so it appears to be three semi-popular blogs
rather than one very popular blog. This would bother me, if
Technorati rankings weren't so utterly meaningless.

Policy

I made a couple of vows when I started the blog. A number of years
ago on my use.perl.org journal I complained extensively about
some people I worked with. They deserved everything I said, but the
remarks caused me a lot of trouble and soured me on blogs for many
years. When I started this blog, I vowed that I wouldn't insult
anyone personally, unless perhaps they were already dead and couldn't
object. Some people have no trouble with this, but for someone like
me, who is a seething cauldron of bile, it required a conscious
effort.

I think I've upheld this vow pretty well, and although there have been
occasions on which I've called people knuckleheaded assholes, it has
always been either a large group (like Biblical literalists) or
people who were dead (like this pinhead) or
both.

Another vow I made was that I wasn't going to include any tedious
personal crap, like what music I was listening to, or whether the
grocery store was out of Count Chocula this week. I think I did okay
on that score. There are plenty of bloggers who will tell you about
the fight they had with their girlfriend last night, but very few that
will analyze abbreviations
in Medieval Latin. So I have the Medieval Latin abbreviations
audience pretty much to myself. I am a bit surprised at how
thoroughly I seem to have communicated my inner life, in spite of
having left out any mention of Count Chocula. This is a blog of what
I've thought, not what I've done.

What I didn't post this year

My blog directory contains 55 unpublished articles, totalling 39,500
words, in various states of incompleteness; compare this with the 161
articles I did complete.

The longest of these unpublished articles was written some time after
my article on the envelope
paradox hit the front page of Reddit. Most of the Reddit comments
were astoundingly obtuse. There were about nine responses of the type
"That's cute, but the fallacy is...", each one proposing a
different fallacy. All of the proposed fallacies were
completely wrong; most of them were obviously wrong. (There is no
fallacy; the argument is correct.) I decided against posting this
rebuttal article for several reasons:

It wouldn't have convinced anyone who wasn't already convinced,
and might have unconvinced someone who was. I can't lay out the
envelope paradox argument any more briefly or clearly than I did; all
I can do is make the explanation longer.

It came perilously close to violating the rule about insulting
people who are still alive. I'm not sure how the rule applies to
anonymous losers on Reddit, but it's probably better to err on the
side of caution. And I wasn't going to be able to write the article
without insulting them, because some of them were phenomenally
stupid.

I wasn't sure anyone but me would be interested in the details of
what a bunch of knuckleheaded lowlifes infest the Reddit comment
board. Many of you, for example, read Slashdot regularly, and see
dozens of much more ignorant and ill-considered comments every
day.

How much of a cretin would I have to be to get in an argument with
a bunch of anonymous knuckleheads on a computer bulletin board? It's
like trying to teach a pig to sing. Well, okay, I did get in an
argument with them over on Reddit; that was pretty stupid. And then I
did write the rebuttal article, which was at least as stupid, but which I
can at least ascribe to my seething cauldron of bile. But it's never
too late to stop acting stupid, and at least I stopped before I
cluttered up my beautiful blog with a four-thousand-word rebuttal.

So that was one long article that never made it;
had it been published, it would have been longer than any other except
the Z2 article.

The article about metric spaces
was supposed to be one of a three-part series, which I still hope to
finish eventually. I made several attempts to write another part in
this series, about the real numbers and why we have them at all. This
requires explanation, because the reals are mathematically and
philosophically quite artificial and problematic. (It took me a lot
of thought to convince myself that they were mathematically
inevitable, and that the aliens would have them too, but that is
another article.) The three or four drafts I wrote on this topic
total about 2,100 words, but I still haven't quite got it where I want
it, so it will have to wait.

I wrote 2,000 words about oddities in my brain, what it's good at and
what not, and put it on the shelf because I decided it was too
self-absorbed. I wrote a complete "frequently-asked questions" post
which answered the (single) question "Why don't you allow comments?"
and then suppressed it because I was afraid it was too self-absorbed.
Then I reread it a few months later and thought it was really funny,
and almost relented. Then I read it again the next month and decided
it was better to keep it suppressed. I'm not indecisive; I'm just
very deliberate.

I finished a 2,000-word article about how to derive the formulas for
least-squares linear regression and put it on the shelf because I
decided that it was boring. I finished a 1,300-word article about
quasiquotation in Lisp and put it on the shelf because I decided it
was boring. (Here's the payoff from the quasiquotation article: John
McCarthy, the inventor of Lisp, took both the concept and the name directly
from W.V.O. Quine, who invented it in 1940.)

Had I been writing this blog in 2005, there would have been a bunch of
articles about Sir Thomas Browne, but I was pretty much done with him
by the time I started the blog. (I'm sure I will return someday.)
There would have been a bunch of articles about John Wilkins's book on
the Philosophical Language, and some on his book about cryptography.
(The Philosophical Language crept in a bit anyway.) There would have
been an article about Charles Dickens's book Great
Expectations, which I finished reading about a year and a half
ago.

An article about A Christmas Carol is in the works, but I
seem to have missed the seasonal window on that one, so perhaps I'll
save it for next December. I wrote an article about how to calculate
the length of the day, and writing a computer progam to tell time by
the old Greek system, which divides the daytime into twelve equal
hours and the nighttime into twelve equal hours, so that the night
hours are longer than the day ones in the winter, and shorter in the
summer. But I missed the target date (the solstice) for that one, so
it'll have to wait until at least the next solstice. I wrote part of
an article about Hangeul (the Korean alphabet), planning to publish it
on Hangeul Day (the Koreans have a national holiday celebrating
Hangeul) but I couldn't find the quotations I wanted from 1445, so I
put it on the shelf. This week I'm reading Gari Ledyard's doctoral
thesis, The Korean Language Reform of 1446, so I may
acquire more information about that and be able to finish the article.
(I highly recommend the Ledyard thing; it's really well-written.) I
recently wrote about 1,000 words about Vernor Vinge's new novel
Rainbows End, but that's not finished yet.

A followup to the article about why
you don't have one ear in the middle of your face is in progress.
It's delayed by two things: first, I made a giant mistake in the
original article, and I need to correct it, but that means I have to
figure out what the mistake is and how to correct it. And second, I
have to follow up on a number of fascinating references about
directional olfaction.

Sometimes these followups eventually arrive,as the one about ssh-agent
did, and sometimes they stall. A followup to my early article about the
nature of transparency, about the behavior of light, and the
misconception of "the speed of light in glass", ran out of steam after
a page when I realized that my understanding of light was so poor that
I would inevitably make several gross errors of fact if I finished
it.

I spent a lot of the summer reading books about inconsistent
mathematics, including Graham Priest's book In
Contradiction, but for some reason no blog articles came of it.
Well, not exactly. What has come out is an unfinished 1,230-word
article against the idea that mathematics is properly understood as
being about formal systems, an unfinished 1,320-word article about the
ubiquity of the Grelling-Nelson paradox, an unfinished 1,110-word
article about the "recursion theorem" of computer science, and an
unfinished 1,460-word article about paraconsistent logic and the liar
paradox.

I have an idea that I might inaugurate a new section of the blog,
called "junkheap", where unfinished articles would appear after aging
in the cellar for three years, regardless what sort of crappy
condition they are in. Now that the blog is a year old, planning
something two years out doesn't seem too weird.

I also have an ideas file with a couple hundred notes for future
articles, in case I find myself with time to write but can't think of
a topic. Har.

I got a number of unpredictable surprises when I started the blog.
One was that I wasn't really aware of LiveJournal, and its "friends"
pages. I found it really weird to see my equation-filled articles on
subvocal reading and Baroque scientific literature appearing on these
pages, sandwiched between posts about Count Chocula from people named
"Taldin the Blue Unicorn". Okay, whatever.

I was not expecting that so many of my articles would take the form
"ABCDEFG. But none of this is really germane to the real point
of this article, which is ... HIJKLMN." But the more articles
I write in this style, the more comfortable I am with it. Perhaps in
a hundred years graduate students will refer to an essay of this type,
with two loosely-coupled sections of approximately the same length,
linked by an apologetic phrase, as "Dominus-style".

Wrong wrong wrong!

I do not have a count of the number of mistakes and errors I made that
I corrected in later articles, although I wish I did. Nor do I have a
count of the number of mistakes that I did not correct.

However, I do know that the phrase "I don't know" (and variations,
like "did not know") appears 67 times, in 44 of the 161 articles. I
would like to think that this is one of the things that will set my
blog apart from others, and I hope to improve these numbers in the
coming years.

Thanks

Thanks to all my readers for their interest and close attention, and
for making my blog a speedy success.

A number of people missed an important piece of context: since the
article was filed in 'oops' section of my
blog, it was intended as a description of a mistake I had made.
The mistake in this case being to work really hard on the first
solution I thought of, rather than to back up at early signs of
trouble, and scout around for a better and simpler solution. I need
to find a way to point out the "oops" label more clearly, and at the
top of the article instead of at the bottom.

Several people pointed out other good solutions to my problem. For
example, Adam Sampson and Robert Loomans pointed out that versions of
ssh-agent support a -a option, which orders the process to
use a particular path for its Unix domain socket, rather than making
up a path, as it does by default. You can then use something like
ssh-agent -a $HOME/.ssh/agent when you first start the agent,
and then you always know where to find the socket.

An even simpler solution is as follows: My principal difficulty was in
determining the correct value for the SSH_AGENT_PID variable.
But it turns out that I don't need this; it is only used for
ssh-agent -k, which kills the existing ssh-agent process.
For authentication, it is only necessary to have
SSH_AUTH_SOCK set. The appropriate value for this variable
is readily determined by scanning /tmp, as I noted in the
original article. Thanks to Aristotle Pagaltzis and Adam Turoff for
pointing this out.

Several people pointed me to the keychain
project. This program is a front-end to ssh-agent. It contains
functions to check for a running agent, and to start one if there is
none yet, and to save the environment settings to a file, as I did
manually in my article.

A number of people suggested that I should just run ssh-agent from my
X session manager. This suggests that they did not read the article
carefully; I already do this. Processes running on my home machine,
B, all inherit the ssh-agent settings from the session manager
process. The question is what to do when I remote login from a
different machine, say A, and want the login shell, which was
not started under X, to acquire the same settings.

Other machines trust B, but not A, so credential
forwarding is not the solution here either.

I had suggested that the kernel probably maintained no direct mapping
from the socket i-number to the filesystem path, and that obtaining
this information would require difficult grovelling of the kernel data
structures. But apparently to whatever extent that is true, it is
irrelevant, since the /proc/net/unix driver has already been
written to do it.

Saving the socket information in a file solves another problem I
had. Suppose I want some automated process, say the cron job that
makes my offsite network backups, to get access to SSH credentials. I
can store the credentials in an ssh-agent process, and save the
variable settings to a file. The backup process can then reinstate
the settings from the file, and will thenceforward have the
credentials for the remote login.

Finally, I should add that since implementing this scheme for the
first time on 21 November, I have started exactly zero new ssh-agent
processes, so I consider it a rousing success.

Messages from the future
I read a pretty dumb article today about passwords that your future
self could use when communicating with you backwards in time, to
authenticate his identity to you. The idea was that you should make
up a password now and commit it to memory so that you can use it later
in case you need to commuicate backwards in time.

This is completely unnecessary. You can wait until you have evidence
of messages from the future before you do this.

Here's what you should do. If someone contacts you, claiming to be
your future self, have them send you a copy of some document—the
Declaration of Independence, for example, or just a letter of
introduction from themselves to you, but really it doesn't need to be
more than about a hundred characters long—encrypted with a
one-time pad. The message being encrypted, will appear to be complete
gibberish.

Then pull a coin out of your pocket and start flipping it. Use the
coin flips as the one-time pad to decrypt the message; record the pad
as you obtain it from the coin.

Don't do the decryption all at once.
Use several coins, in several different places, over a period of
several weeks.

Don't even use coins. Say to yourself one day, on a whim, "I think
I'll decrypt the next bit of the message by looking out the window and
counting red cars that go by. If an odd number of red cars go by in
the next minute, I'll take that as a head, and if an even number of
red cars go by, I'll take that as a tail." Go to the museum and use
their geiger counter for the next few bits. Use the stock market
listings for a few of the bits, and the results of the World Series
for a couple.

If the message is not actually from your future self, the coin flips
and other random bits you generate will not decrypt it, and you will
get complete gibberish.

But if the coin flips and other random bits miraculously turn out to
decrypt the message perfectly, you can be quite sure that you are dealing
with a person from the future, because nobody else could possibly have
predicted the random bits.

Now you need to make sure the person from the future is really you.
Make up a secret password. Encrypt the one-time pad with a
conventional secret-key method, using your secret password as the key.
Save the encrypted pad in several safe places, so that you can get it
later when you need it, and commit the password to memory. Destroy
the unencrypted version of the pad. (Or just memorize the pad. It's
not as hard as you think.)

Later, when the time comes to send a message into the past, go get the
pad from wherever you stashed it and decrypt it with the secret key
you committed to memory. This gives you a complete record of the
results of the coin flips and other events that the past-you used to
decrypt your message. You can then prepare your encrypted message
accordingly.