The TEI and Author/Editor

This information is derived from a post to the TEI-L about the
problems involved in getting some parts of the TEI DTD to
compile with RulesBuilder,
the DTD parser/compiler companion for SoftQuad's Author/Editor
product.

This message did not pretend to be anything other than an
informative summary of events: it is not intended to be regarded as
any kind of formal statement of how a solution ought or ought not to
be derived.

> I accidentally erased the message, but in the last couple of
> days someone wrote an aside about the horrors of using Rules Builder
> with the TEI DTD(s).

That was probably me, but there weren't any horrors, just a few
snags getting it to swallow stuff the right way round. Now that I have
found out how to do it (thanks to some swift help from SQ), it's
working fine. I was critical of RB doing things this way, but SQ's
support has been excellent and has fixed the problem.

> Please tell me more. I am about to embark on a medium-sized tagging
> project (a 45,000-word language corpus), and after getting lots of
> really slick adverts from the big names in the business (ArborText,
> Interleaf, etc) with Big Business-sized price tags, I had pretty
> much decided on getting an Author/Editor-Rules Builder bundle
> which carries a *substantial* academic discount. But now I'm not
> so certain. . .

No, as far as I can see A/E is quite capable of handling this.
I don't know how it performs on a file that large (if you really do
have it all as a single corpus file), as my experience of editing
files >1Mb has been limited to A/E on a very small old PC, which is
not a valid platform to compare it with modern machines. (In fact, A/E
worked fine, it was the slowness of the PC and lack of memory and disk
space which I had a problem with :-)

I got the DTD finished yesterday, and I promised to tell you
all what I had to do, so here goes:

I had tried to compile the DTD as it stands, using Rules
Builder for MS-Windoze. It didn't like it, which was why I posted my
original complaint. I'm not sure why SQ wrote it so that it won't
accept a prolog in the same fashion as other SGML software, but they
were quick to find me a solution once I had identified the
problem.

The file curia.entities contains extra character entities as well as some renaming
of elements. The project is adding a lot of content-descriptive markup,
so shortening the tagnames shortens file size and makes it more usable
for people without graphical SGML editors. Here's the file:

These last two lines are what was supposed to do the trick, but
RB still didn't like it, so I replaced the two lines with the entire
contents of tei2.dtd and suddenly it seemed to work.
Not quite: during compilation it claimed it couldn't find any of the
parameter entity files referenced in the above, even though they were
in the same directory as the DTD it was compiling, but fortunately it's
smart enough to put up a dialog box so you can tell it (rather than
dying like so much software does). This is fixed by editing the
configuration file rb.ini and including a period and semicolon before
the relevant search paths, so that it searches the current directory
first, before trying elsewhere and failing.
Incidentally that still doesn't fix the failure to find isolat1.ent
and isolat2.ent...I've obviously missed something, and I'm sure it must
be easy to fix.

At the end of compilation it complained fatally that I had some elements
referenced but not declared. Now, this is the source of some contention.
First, you have to be very clear that we are talking about elements
which your DTD mentions in a content model, but which are not declared
because they come from modules which are not loaded in your version of
the DTD (eg <camera>, because you might not be doing film stuff). We are
not talking about elements which are declared but simply not used
anywhere: RB already handles these latter via a checkbox on the BUILD
panel.
Referencing elements which are not declared is actually permissible SGML
(which surprised me when I first learned it) but it is quite kosher:
it's just not handled by RB. In some people's view this is a bug. IMHO
it's probably politer to call it by the latest buzzword, a USI (Unexpected
System Inability :-)
The simple way round it is to add declarations for all the `missing'
elements, and set them either to EMPTY or ANY. This does have the
disadvantage that they will then appear in the `allowed elements'
menus when you do an `Insert Element' etc, but you can comment them using
the tag file mechanism that RB provides, so that users know not to use
them.
RB doesn't do a log file which includes a list of the undeclared elements,
so you have to copy them down from the error panel (which only mentions
the first few), then add them to the DTD, re-compile, note down the next
batch mentioned, add them, re-compile, and repeat until you have them
all. Windoze doesn't let you clip text from error panels :-)

So here's what I ended up with, taking it from the bit about referencing
the `master' tei2.dtd file which you recall I commented out:

It works just fine now I've got the knack. It sounds long-winded but
it's not, really, just a little different from other systems I've used.

> (BTW, I am primarily an OS/2 user, and although I have several
> Windows-based products, MS-published software is generally not
> welcome here. So the inexpensive MS Word SGML editor add-ons are
> not an option.)

I don't know if SQ do an OS/2 version of A/E. I haven't had a chance to test
SGML Author for Word or WordPerfect SGML Edition with the TEI. Anyone done
this yet?