Ostensible Mark-up Language

Copyright (c) 2001 by Rich Morin
published in Silicon Carny, February 2001

The Meta Project's file-tree browser is supposed
to recognize path names and supply descriptive information,
but in cases like /dev/*,
this can be a real challenge.
Using Perl and OML (an informal variant of XML), however,
Rich Morin has pieced together a solution,
and in this month's Silicon Carny,
he shares it with you.

As I mentioned last month, I'm working on a file-tree browser for the Meta Project. One of the
interesting sub-problems I've encountered has to do with characterizing
the device names in /dev. As mentioned in the demo, the
idea is that a user should be able to enter an arbitrary device name
(e.g., /dev/ersa0.1) and receive some useful information,
as:

/dev/rwd0a is a member of the wd
(Generic WD100x/IDE disks) family. This device node has the attributes
"raw, unit 0, partition a". Related device nodes have names that match
the Perl regular expressions /^r?wd[0-9]+$/,
/^r?wd[0-9]+[a-z]$/, and
/^r?wd[0-9]+s[0-9]+.*$/.

There are literally thousands of possible device names, so a brute-force
approach is out of the question. Even when the name space is folded down
by unit numbers and such, there are hundreds of device families (e.g.
/dev/*sa*). Although many of these families have similar
name-formation rules, there are over a dozen sets of rules, all told.

My solution to this nightmare is based on three components: a set of
device family descriptions, a set of parsing macros, and some supporting
Perl code. Both the descriptions and the macros use XML syntax.

By matching each family's base name (e.g., sa) against the
name in question, I can find out if it has even a chance of matching the
desired name. Assuming that this initial test succeeds, I can use a
specific set of parsing macros to see if I really have a match.

The parse entry looks pretty complex, but it's actually just a mnemonic
name for the parsing macro. Any unique text string would serve to
identify the macro, but this one gives a hint to the nature of the
required parsing. The rest of the description should be pretty
self-explanatory.

I should note, in passing, that the description above is written in
something I call Ostensible Mark-up Language (OML). That is, it looks
enough like XML to pass muster, but it doesn't have a style sheet or
other niceties. It may also contain things, such as Perl regular
expressions, that aren't really kosher by normal XML standards.

Parsing macros

Assuming that the entered device name contains the device family's base
name (sa), we look at the contents of the specified parsing
macro(s) (e.g., [EN]?R?BU.M ):

If the entered device name matches the regular expression specified in
regexp, Perl will fill four numbered variables (e.g.,
$1) with captured substrings. We can then
interpret these substrings, based on the names in the
substr entry.

Supporting Perl code

Fortunately, the really hard parts of the job are accomplished by some
handy Perl utility modules. For instance, the XML text is stored in a
tied hash, using BerkeleyDB::Btree. Parsing the XML into
Perl data structures is accomplished by XML::Simple;
printing the structures (for debugging) is accomplished by
Data::Dumper.

With these nasty parts under control, we only need to fiddle the
returned values into English. Here's a simplified version of the
relevant code. The device family name ($dname) gets plugged
into the regular expression ($regexp), and is then matched
against the query name ($qname). The resulting substrings
are folded into a parenthesized expression ($paren), which
is then tidied up into passable English format.

I won't try to pretend this is elegant code, but it gets the job done in
a small and reasonably simple bit of code. Part of the reason for this
brevity lies in Perl and its very handy modules.

Another part, however, comes from using XML as a tool to build a little
language. By creating XML-based parsing macros (complete with embedded
regular expressions), I was able to encode some fairly complex notions
in a very compact form.

I'm not sure what other applications could benefit from this approach,
but I think that it is one that will stay in my coding arsenal. Here's
hoping it will find a place in yours...

About the author

Rich Morin (rdm@cfcl.com) operates
Prime Time Freeware (www.ptf.com),
a publisher of books about Open Source software.
Rich lives in San Bruno, on the San Francisco peninsula.