Dracon and Postel

Search

There’s been a flurry of debate over in the
PEAW
mailing list
about how to deal with broken feeds.
Simultaneously, Aaron Swartz asserts
Postel’s Law Has No
Exceptions.
Herewith a bit of back-fill on the relevant history and tribal knowledge,
an excursus into Athenian jurisprudence, and opinions on what PEAW should
do.

Dracon (c.659-c.601 B.C.E.) introduced the first written
legislation to Athens. His code was consistent in that it decreed the death
penalty for crimes both low and high. Similarly, a conforming XML processor
must "not continue normal processing" once it detects a fatal error. Phrases
used to amplify this wording have included "halt and catch fire", "barf",
"flush the document down the toilet", and "penalize innocent
end-users".

The rest of that note provides a useful introduction to the issue.

[Excursus: I have since learned that the assertion above about
Dracon is pretty flimsy, since Dracon is only barely historical.
He and Solon
feature as the Givers of Law in the traditions of Athenian history, but only
very small fragments survive (having to do with involuntary homicide) which
can plausibly be attributed to Dracon, and they’re not very
Draconian.]

As the note says, this issue provoked what is probably the single most
intense technical debate of my professional career.
Enthusiasts can relive it, via
several hundred emails to be found in the
April
and
May
1997 archives of what was then known as the W3C SGML Working Group.
But don’t try to read it unless you have a couple of hours to spare.

Since I was arguably the leader of the “Draconian” forces in that
debate I’m hardly objective, but I think that XML’s cleanly-defined
error-handling has been a net positive.
Of course, we can never re-run history to find out for sure.

I will say, though, that it’s become awfully damn easy to test an
allegedly-XML document for well-formedness: try to open it in either IE or
Mozilla, and you’ll know right away.
Anyone who finds this too much effort deserves little sympathy.

What Should PEAW Do? ·
The range of applications where PEAW will be put to work is
pretty wide, and different apps will have different error-handling
requirements.
If, for example, I’m reading
one of my favorite blogs, and the
aggregator turfs an entry because the (required)
<modified> is missing, I’m going to be
irritated.
On the other hand, when I’m reading a feed describing my credit-card
transactions, if a charge comes through without a date-stamp I want the
aggregator to scream loudly and let me know; something here is gravely amiss,
either with the credit card, the bank, or the software.

So if I were writing the spec, I’d do as XML does and divide the
errors into two classes, fatal and non-fatal.
I’d use SHOULD to encourage agents to report even non-fatal errors in the
interests of the system working better, and I’d allow
aggregators to turf entries on the basis of non-fatal errors, because this
will be a requirement in some applications.

So What’s a Fatal Error? ·
This hasn’t been discussed that much, but it’s not obvious which,
if any, violations of the semantics or structure of a feed should constitute
fatal errors, i.e. those where the client software is required to
stop trying to work with the data.

However, I would absolutely require basic XML well-formedness.
Here’s why:

If you require well-formedness, you require basically sane Unicode
handling, which opens the gates of syndication to the vast majority of people
in the world who don’t live in ASCII.
I can’t emphasize this enough: if you can count on well-formed XML, you
are empowered to handle the languages of the world.
If you try to work around what look like illegal characters, you
guarantee huge amounts of irritation getting internationalized
later.

As regards everything but the content, it’s just not very hard
to create well-formed XML: escape < and &
and ' and " and > and you’re done.

Except for that Unicode stuff; you need to know what encoding your
data is in, so that for example when you see a Euro sign (€) you know
enough to emit &#x20ac;, not some Microsoft Code Page byte
that’s
guaranteed not to work on lots of browsers.
This can be tricky.
But the alternative is, you’re a parochial bigot.

As regards your content, you need to know whether
or not you can guarantee that it’s well-formed.
If you can’t, PEAW provides the mode="escaped" hatch.

If your software can’t manage to escape five special characters and
fill in end-tags and quote attributes,
it’s failing to meet such a very low
barrier to entry that it’s probably pretty lame anyhow.
And if developers are not willing to put in the effort to enable the non-white
people of the world to use their software, I don’t think PEAW should
condone or reward them.

The transition from RSS to PEAW is a line in the sand.
Granted that the RSS legacy necessarily required the use of liberal parsers,
but hey, that was then, we have better tools now.
I just find it really hard to believe that someone sitting down to write a
PEAW generator in A.D. 2003 can’t manage to generate well-formed
XML, with content-escaping if (sigh) necessary.