Yes, graphs and trees is kind of a permathread. It seems to be both a
common problem and an intractable one, at least so far.
I spent much of my morning sorting out cross-reference issues within a
single huge document today, heard a lot of different takes on graph-tree
interactions at Extreme, and found lots of blogs talking about graphs
and trees (RDF and XML, wound around namespaces) this week on my return.
The graph/tree issue seems to come down to a pretty simple problem:
people generally prefer working with graphs, but linear forms (the
series in serialization) force us to use other approaches.
The ideal in telling a story or writing a document generally is a single
pass from start to finish, but forward and backward references are
ordinary in many kinds of prose. Thinking back on my own early writing
efforts, I created paragraphs full of loosely-connected sentences until
the essay form gave me some notion of how to structure them, and I
certainly backslide regularly. Hypertext certainly encourages a return
to graph forms, even as we encode the documents in trees.
Programmers work with graphs constantly, and circular references among
objects, while troublesome for garbage collectors, are pretty ordinary
in practice as developers strive to keep track of information from
different perspectives. Relational databases are all about the
connections between pieces of information, and while I've generally
avoided circular structures there, I'll confess that I've created them a
few times. (They're handy when I want to create a summary table through
a query but preserve the links to the original material.)
Extreme had a bunch of presentations on reconciling graphs and trees:
Browser Bookmark Manage with Topic Maps - Thomas Passin
Object Mapping for Markup Semantics - David Dubin*
Generalizing XPath for Directed Graphs - Steve Cassidy
Gliding Down From Graphs to Trees: An Attempt to Bottle Geometry and
Chemical Content - K. Shanthi and S.K. Venkatesan
RDF Twig: Accessing RDF Graphs in XSLT - Norman Walsh
Uniform Access to Infosets via Reflection - Henry Thompson and K. Ari
Krupnikov.
(* - I didn't get to see David Dubin's talk, so this inclusion is
hearsay.)
We heard a number of times about how graphs are like trees except that
nodes can have multiple parents, which lets all hell break loose I mean
is really convenient. Graphs tend to rest uncomfortably in trees, which
I think is the main problem with RDF/XML and its variety of options.
Using tree-based tools (XSLT is a good example) on graph-based content
(RDF is a good example) is a challenge. I gave it a shot [1], but more
thorough approaches like Norm's [2] which treat the graphs as graphs are
probably a better overall idea.
Henry Thompson's talk on Infoset via Reflection was fascinating to me,
though not because I'm fond of the PSVI. The especially interesting
bits had to do with the nature of schema information, which is -
surprise! - a graph. Making that system work pits developers against
the same kinds of challenges as making RDF, Topic Maps, or the
linguistic annotations that Steve Cassidy described work in XML. (Tom
Passin had it slightly easier, I think, as he was mostly moving from
trees to graphs, but the shift was still noticeable.)
Overall, my suspicion is that graph serialization is far more
challenging than tree serialization, as the worst complication in tree
serialization is the need to add (sometimes arbitrary) order to the
parts of the tree which aren't considered ordered. Graph serialization
offers far more choices about different ways of representing connections
as well as the general problem of unrolling circular connections into a
straight-line sequence of bits.
I don't think there's a simple answer here, but I suspect that in the
long term, graph work might be easier to do outside of tree contexts and
vice-versa. It's impressive to see people putting such effort into
reconciling the two contexts, and I suspect there's a lot of room for
small-scale mixing of graphs and trees (or using graphs to connect
trees, as in hypertext), but the tools and practices seem very
different.
(For example, Edd Dumbill describes "XML syntax both a blessing and a
curse" for RDF [3] and I find RDF's influence on XML, especially around
namespaces, similarly unhelpful. Topic Maps are a slightly easier
squaring of the circle at first, but run into similar problems. Objects
create similar problems, though they may not appear in simple cases.)
The sky isn't falling and there's no catastrophe in the offing, but I
suspect these problems will stay with us for a while to come.
[1] - http://simonstl.com/projects/foaf-xml/
[2] - http://rdftwig.sourceforge.net/
[3] - http://usefulinc.com/edd/blog/2003/8/8#13:13
--
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org