At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote:
>The fact is that creating, populating, and manipulating a data model
>has costs. This is true of DOM, SAX (where the data model is
>managed by the application), esXML (where the data model is also the
>'serialized' format so all costs are manipulation), and all other
>applications that involve internal and external data (Corba, DCOM,
>ONC-RPC, ASN.1/xER, etc.). It's not fair to ignore part of the
>processing cycle for a format (esXML) that trades some manipulation
>overhead for all parsing/serialization/object creation/object
>population overhead.
>
I consider creating and populating the data model to be part of
parsing if it's done from an event stream. For instance, the time to
build a DOM document object is significant. Sorry if that wasn't
clear. My point is that once the object exists in memory the
manipulations from that point until you start serializing are
irrelevant. In my tests with my model, parsing/object creation is
about 2/3 of the time, serialization is about 1/3, and manipulation
is unmeasurable. Various optimizations adjust the absolute numbers,
but the 2-1-0 ratio seems pretty consistent. Possibly other formats
have different ratios. However, given that real world programs read
data from input streams and write them to output streams rather than
byte arrays like benchmarks do, it doesn't seem credible that
in-memory XML operations like add and remove are worth optimizing.
>Additionally, the whole parsing etc. stream for XML must be
>completely performed, in DOM cases and many SAX cases, for every
>element of a document/object. With esXML, if a 3000 element
>document/object were read in and 5 elements manipulated, you only
>spend 5*element-manipulation-overhead.
I flat out don't believe this. I think there's an underlying
assumption here (and in some of the other binary formats) which once
again demonstrates that they are not as much like XML as they claim.
The only way you can limit this is by assuming the data in your
stream is well-formed. In XML, we don't assume that. One of the 3000
nodes you don't process may be malformed. You're assuming that's not
the case, and therefore avoiding a lot of overhead in checking for
it. A large chunk of any speed gain such a format achieves over real
XML is by cutting corners on well-formedness checking.
If this is not the case for esXML and indeed it does make all
mandated well-formedness checks, then please correct my error.
However, I'd be very surprised that in that case that one could
indeed limit parsing overhead to the raw I/O.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexmlhttp://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA