Sam Ruby wrote:
> Aleksander Slominski wrote:
> >
> > i have re-run your tests and added new tests for Xml
> > Pull Parser (modified test sources are available at
> > http://www.extreme.indiana.edu/~aslom/echosoap/)
>
> I must say that I am impressed by how fast you were able to do that!
thanks :-)
> Is that with xpp or xpps?
Xml Pull Parser or just XPP (hopefully there are no too many XPPs only XPP
Parses Perl' or 'XPML Page Parser, X Printing Panel,X-Windows Phase Plan plus
Auto, OASIS XML for Publishers and Printers (XPP), Xyvision's Production
Publisher and probably some others that i missed...)
> I'm going to take a look into merging the concept of your
> FixedLengthInputStream with my NonBlockingBufferedInputStream - the result
> should be a savings of a copy of the message in all tests except char and
> byte.
i have initially tried to do this but it needs much more work to do (see
*Streaming.java). i think that it should be improved with keep-alive
connections as from our experience socket opening/closing is quite of overhead.
it is maybe also worthwhile to try to use DataInputStream to read header lines
(it may be more optimized than reading byte by byte).
> The name "char" is a misnomer. If you look closely at that test, you will
> see that it contains a number of dubious practices that seem common place
> throughout much of the current xml-soap code base, such as concatenating
> strings into intermediates, and the concatenating the results into larger
> strings. Also the XML "parsing" in that test is hardly robust. ;-)
then you probably should give it a different name as 'typical' and still have
char test to check speed of converting byte stream into reader (character
stream). i noticed that in char example you read length from header and use it
to read that number of _characters_ - it won't work correctly with anything
but 8 bit encodings - and for sure can fail for UTF-8 or UTF-16...
> My only criticisms of your tests is that it hardly seems fair to grab the
> sixth result from pp.next() - in order to compare apples to apples, I'd
> prefer that start tags were read, and it be the text matching the "input"
> tag that was extracted, and that all the tags are read. This would make
> the xpp results parallel the xni, sax, and dom results. I realize that a
> case could be made that with a pull parser there is no requirement to read
> to the end, but I suspect that in real soap stacks the entire message will
> be relevant.
hey you are also grabbing just the value from SAX event stream!
however XPP is _reading_ all tags and maintaining namespaces! just that in this
test this information is silently discarded (as it is in DOM test). i think
that adding readStartTag() or running XPP/SAX 1.0 (xpp2sax) will not have much
influence on test results. BTW xpp2sax allows for both pull and push parsing
with the same input - you can start SAX parse() on any start tag (and you could
even build real DOM from it or some more specialized tree representation like
electric xml - i also plan to add SAX2.0 driver to XPP...).
> > i am suspecting that tests are spending too much time
> > doing buffering and memory IO including socket connection
> > and disconnection that heavily affects performance - it
> > would be interesting to see tests that uses HTTP keep-alive
> > and allows for streaming of input into parsers (and not
> > buffering it). however it requires very careful coding that
> > will not introduce unnecessary buffering and delays...( it
> > would be also interesting to do some testing with chunked
> > encoding but this is even more difficult...).
>
> If the goal were to simply compare parsers, the I would actually eliminate
> all HTTP and socket overhead. I guess my question is: is a steady stream
> of tiny messages from a single client actually what we want to optimize
> for? My reason for closing the socket and getting a new one each time was
> to mirror what I presume would be closer to real world usage - namely a
> server that gets messages from a large number of clients.
if messages are small I/O efficient becomes paramount and keep-alive makes a
lot of sense or any application code (such as XML parsing) will be only
fraction of actual tested time ... when you have many clients it is more fair
testing as IO becomes less of bottleneck (having opened multiple
persistent/keep-alive input streams server can multiplex IO waits).
> My hope was that given the large amount of overhead, the time to parse this
> tiny message should approach the noise range. From the looks of things, I
> would say that xpp achieves this (modulo the concerns above), and hopefully
> by eliminating the need for the ByteArrayInputStream by incorporation of
> the FixedInputStream concept might make things look even better.
please make sure that you have very efficient HTTP header reading ot it will be
consuming more time than XML envelope reading...
> And then we could move on to more substantial messages...
that would be really interesting (especially checking with streaming on top of
HTTP/1.1 chunking).
alek
ps. if you want to time just small parts of test you may want to use
high-resolution timer to determine exact time for each phase. i have one that i
udapted form some JavaWorld article that works both on Windows and Solaris so
if you are interested i can pack it and put on the web (it is using JNI to
access small C routines that tap into system high-resolution clock).
--
Aleksander Slominski, LH 316, IU, http://www.extreme.indiana.edu/~aslom
As I look afar I see neither cherry Nor tinted leaves Just a modest hut
on the coast In the dusk of Autumn nightfall - Fujiwara no Teika(1162-1241)