Need to prevent parsing server-side responses and client-side requests into axiom trees if using axiom and payload caching is off

Details

Description

If using the axiom message factory with payload caching turned off, spring-ws currently supports streaming a request on the server-side and a response on the client-side without parsing the stream into an axiom tree. But responses on the server-side and requests on the client-side do get parsed into axiom trees. So turning off payload caching is non-symmetrical.

If this can be prevented, then spring-ws should be able to support large client requests and large server responses with much better performance. This also would even help with medium-sized messages in high volume scenarios.

The AxiomSoapBody's getPayloadResult() method currently creates a custom SAX Content Handler that effectively builds an axiom tree for the payload during transforms involving the payload result. The proposed enhancement is to basically have getPayloadResult() return a subclass of StreamResult feeding a ByteArrayOutputStream, and then in the AxiomSoapMessage's writeTo() method insert an Axiom OMSourcedEle as the payload, containing a custom OMDataSource built around the above ByteArrayOutputStream. This allows a 'placeholder' for the payload in the axiom tree containing the raw payload XML, and when writeTo() serializes the axiom message to the transport's output stream, the raw payload XML is written to the output stream without ever getting parsed. It is a little more involved than the above, but that describes the highlights.

If something like an interceptor, client-side callback, or logging code needs to access the contents of the payload, axiom will automatically parse the contents of the OMDataSource into a fully expanded Axiom tree. So this gives the flexibility of accessing the payload data if need be, but does not parse it into axiom nodes unless absolutely needed.

I am preparing a patch of contribution files that does the above. I mainly just need to wrap up the unit tests and do some performance testing, and then I will attach the patch.

Arjen - Right, it would still store it in memory, unparsed as a byte array, but allows you to use the normal message factory, endpoints, and interceptors (as long as they don't need access to the payload contents - if they do then it would get parsed). It seems this would be appropriate behavior (ie non-parsed, in memory) for most client -side requests and server-side responses, similar to how client-side responses and server-side requests work with axiom for the general efficiency improvements.

But yeah, I think for really massive payloads (say hundreds of MBs or more), a pure streaming solution would be useful for those specialized scenarios. The above might be able to be extended at some point to also support pure streaming - it would probably involve having the getPayloadResult() return a custom StreamResult that allowed a 'streaming-focused' end point adapter or end point to put an InputStream (or something similar) into the result to allow the payload to actually be somewhere out-of-memory (say maybe on disk). And then when the AxiomSoapMessage's writeTo() method serializes the axiom message, the custom DataSource would just read from that streaming inputStream and write that to the transport output stream. Or alternatively, your thoughts of using a specialized message factory that precluded use of interceptors would probably be fine as well.

I figured the above proposal would be a first step that would improve perf for the general non-massive scenarios (which is what I ran into), and that a specialized solution for pure streaming could be introduced at some point as well.

Jim Cummings
added a comment - 06/Mar/08 5:21 AM Arjen - Right, it would still store it in memory, unparsed as a byte array, but allows you to use the normal message factory, endpoints, and interceptors (as long as they don't need access to the payload contents - if they do then it would get parsed). It seems this would be appropriate behavior (ie non-parsed, in memory) for most client -side requests and server-side responses, similar to how client-side responses and server-side requests work with axiom for the general efficiency improvements.
But yeah, I think for really massive payloads (say hundreds of MBs or more), a pure streaming solution would be useful for those specialized scenarios. The above might be able to be extended at some point to also support pure streaming - it would probably involve having the getPayloadResult() return a custom StreamResult that allowed a 'streaming-focused' end point adapter or end point to put an InputStream (or something similar) into the result to allow the payload to actually be somewhere out-of-memory (say maybe on disk). And then when the AxiomSoapMessage's writeTo() method serializes the axiom message, the custom DataSource would just read from that streaming inputStream and write that to the transport output stream. Or alternatively, your thoughts of using a specialized message factory that precluded use of interceptors would probably be fine as well.
I figured the above proposal would be a first step that would improve perf for the general non-massive scenarios (which is what I ran into), and that a specialized solution for pure streaming could be introduced at some point as well.
Does that seem like it might be a reasonable approach?

I should probably mention I wasn't proposing trying to address the pure streaming issues for really large payloads with this issue/patch. I was hoping to instead reduce the processing (and memory churn to some degree) generally required for medium and largish payloads by avoiding the full-blown axiom trees.

Jim Cummings
added a comment - 06/Mar/08 5:38 AM I should probably mention I wasn't proposing trying to address the pure streaming issues for really large payloads with this issue/patch. I was hoping to instead reduce the processing (and memory churn to some degree) generally required for medium and largish payloads by avoiding the full-blown axiom trees.

IMO the custom OMDataSource could be provided by the MessageEndPoint itself. With a JiBX marshaller, a custom OMDataSource can be used to marshall Objects only when the writeTo() method is called. This trick can avoid an in memory ByteArrayOutputStream.

This process is used by Axis2 for Jibx. An OMElement containing a JibxDataSource is added to the OMContainer.

Could we have something like "OMDataSourceResult" containing a custom OMDataSource ? I'm not expert with Axiom and Spring-WS

Sebastien Chatel
added a comment - 12/Mar/08 4:41 AM Hello jim,
I've found this issue too.
http://forum.springframework.org/showthread.php?t=50967
IMO the custom OMDataSource could be provided by the MessageEndPoint itself. With a JiBX marshaller, a custom OMDataSource can be used to marshall Objects only when the writeTo() method is called. This trick can avoid an in memory ByteArrayOutputStream.
This process is used by Axis2 for Jibx. An OMElement containing a JibxDataSource is added to the OMContainer.
Could we have something like "OMDataSourceResult" containing a custom OMDataSource ? I'm not expert with Axiom and Spring-WS
Here is an example :
IMarshallable mrshable = (IMarshallable) responseObject;
OMDataSource dataSource = new JiBXDataSource(mrshable, bindingFactory);
int index = mrshable.JiBX_getIndex();
OMNamespace appns = oMFactory.createOMNamespace(bindingFactory.getElementNamespaces() [index] , "");
OMElement el = oMFactory.createOMElement(dataSource, bindingFactory.getElementNames() [index] , appns);
oMContainer.addChild(el);

Delaying the marshalling until the message's writeTo() method (e.g. like in a manner you suggested) could be made to avoid an intermediate byte array stream. I think the trick would be how to get this to work for the other binding technologies' endpoints (assuming it was ok to have Axiom specific code in each), and finding something similar that would work on the client-side too when sending large requests (which also ends up using the body.getPayloadResult()).

Although I have not had the bandwidth to do performance/throughput tests on this yet, I'll go ahead and attach my patch to this Jira issue based on my earlier ByteArrayOutputStream-based suggestion with some modified and new unit tests. This appears to provide a baseline general enhancement that should work with all the existing endpoints and on the client-side as well, but as discussed does marshall the response temporarily to a ByteArrayOutputStream, but does avoid building a full Axiom tree.

It probably wouldn't be too hard to further extend the AxiomSoapBody at some point to allow alternate custom payload results to be returned that could hold other things besides a ByteArrayOutputStream, so that some (e.g. possibly JiBX and pure streaming) endpoints (not sure about the client-side though) could allow even more optimized writing of the payloads to the transport. I'm not clear off-hand what the least invasive way to do this would be though.

Jim Cummings
added a comment - 12/Mar/08 4:10 PM Hi Sebastien,
Delaying the marshalling until the message's writeTo() method (e.g. like in a manner you suggested) could be made to avoid an intermediate byte array stream. I think the trick would be how to get this to work for the other binding technologies' endpoints (assuming it was ok to have Axiom specific code in each), and finding something similar that would work on the client-side too when sending large requests (which also ends up using the body.getPayloadResult()).
Although I have not had the bandwidth to do performance/throughput tests on this yet, I'll go ahead and attach my patch to this Jira issue based on my earlier ByteArrayOutputStream-based suggestion with some modified and new unit tests. This appears to provide a baseline general enhancement that should work with all the existing endpoints and on the client-side as well, but as discussed does marshall the response temporarily to a ByteArrayOutputStream, but does avoid building a full Axiom tree.
It probably wouldn't be too hard to further extend the AxiomSoapBody at some point to allow alternate custom payload results to be returned that could hold other things besides a ByteArrayOutputStream, so that some (e.g. possibly JiBX and pure streaming) endpoints (not sure about the client-side though) could allow even more optimized writing of the payloads to the transport. I'm not clear off-hand what the least invasive way to do this would be though.
Jim.

Arjen - Here is a patch that seems to prevent the payload from getting built into a full axiom tree before getting written to the transport output stream. It temporarily puts the payload into a ByteArrayOutputStream instead inside the OMDataSource as described. I've modified/added some unit tests, but have not had a chance to properly performance test this.

This should work for all endpoints' server-side responses and on the client-side for client-side requests as well.

The only side-effect I can think this might have is that the payload Axiom element is not put into the body element until the message's writeTo() is called. So if an interceptor needed to access the payload it wouldn't be there - but the javadocs already warn this won't work if payload caching is turned off anyways. If that was concerning, this could easily be keyed off a new flag in the axiom message factory besides payloadCaching.

Jim Cummings
added a comment - 12/Mar/08 4:23 PM Arjen - Here is a patch that seems to prevent the payload from getting built into a full axiom tree before getting written to the transport output stream. It temporarily puts the payload into a ByteArrayOutputStream instead inside the OMDataSource as described. I've modified/added some unit tests, but have not had a chance to properly performance test this.
This should work for all endpoints' server-side responses and on the client-side for client-side requests as well.
The only side-effect I can think this might have is that the payload Axiom element is not put into the body element until the message's writeTo() is called. So if an interceptor needed to access the payload it wouldn't be there - but the javadocs already warn this won't work if payload caching is turned off anyways. If that was concerning, this could easily be keyed off a new flag in the axiom message factory besides payloadCaching.

Note that I have created SWS-352 for the full-streaming approach. I'd be interested in your comments on that. Implementing SWS-352 is a lot more work than this issue, so I've set that for the next major version (currently 1.6).

Arjen Poutsma
added a comment - 03/May/08 1:03 AM Great stuff, Jim. I will add this to 1.5.2. Thanks!
Note that I have created SWS-352 for the full-streaming approach. I'd be interested in your comments on that. Implementing SWS-352 is a lot more work than this issue, so I've set that for the next major version (currently 1.6).

I've resolved this issue, with a few changes/improvements to the original patch:

I've replaced the commons io ByteArrayOutputStream with the standard java.io.ByteArrayOutputStream. I don't want to require the extra dependency to commons io. What we could do instead is detect it: if available, use commons io, but default to java io. This is not done yet, let me know if you want it.

The XML declarations are now filtered when written, not when read. This gets rid of the error-prone inputstream filtering in AxiomXMLDeclarationFilter, which only works with ASCII-based encodings (UTF-8), but not with UTF-16, for instance.

I use the Axiom ByteArrayDataSource, rather than the AxiomByteArrayDataSource.

The byte array element is now attached to the SOAP body when writing is completed. This gets rid of the addNonCachedPayloadElement() in AxiomSoapMessage.

I've also separated the non-caching/caching behavior in AxiomSoapBody into two separate classes. There is an abstract base class Payload in org.springframework.ws.soap.axiom, with two subclasses CachingPayload and NonCachingPayload. AxiomSoapBody creates an instance of these in the constructor, based on the payloadCaching boolean parameter. This gets rid of the if (payloadCaching) {} else {} logic scattered throughout the body.

Jim, could you update SVN or download a snapshot and tell me what you think? I've tried to stay close to the original idea of your patch, but I might have missed something. I'd like to make this part of the release this Friday, and I want to make sure I didn't screw up.

Arjen Poutsma
added a comment - 14/May/08 1:55 AM I've resolved this issue, with a few changes/improvements to the original patch:
I've replaced the commons io ByteArrayOutputStream with the standard java.io.ByteArrayOutputStream. I don't want to require the extra dependency to commons io. What we could do instead is detect it: if available, use commons io, but default to java io. This is not done yet, let me know if you want it.
The XML declarations are now filtered when written, not when read. This gets rid of the error-prone inputstream filtering in AxiomXMLDeclarationFilter, which only works with ASCII-based encodings (UTF-8), but not with UTF-16, for instance.
I use the Axiom ByteArrayDataSource, rather than the AxiomByteArrayDataSource.
The byte array element is now attached to the SOAP body when writing is completed. This gets rid of the addNonCachedPayloadElement() in AxiomSoapMessage.
I've also separated the non-caching/caching behavior in AxiomSoapBody into two separate classes. There is an abstract base class Payload in org.springframework.ws.soap.axiom, with two subclasses CachingPayload and NonCachingPayload. AxiomSoapBody creates an instance of these in the constructor, based on the payloadCaching boolean parameter. This gets rid of the if (payloadCaching) {} else {} logic scattered throughout the body.
Jim, could you update SVN or download a snapshot and tell me what you think? I've tried to stay close to the original idea of your patch, but I might have missed something. I'd like to make this part of the release this Friday, and I want to make sure I didn't screw up.
Thanks!

Arjen - Your changes/improvements look nice and elegant. I see no problem with them at all.

I think for now using the ByteArrayOutputStream from the JDK is ok, and if anyone has perf issues, it could always be a future improvement to detect the commons.io version of ByteArrayOutputStream.

Two comments on the axiom version. I think there is a mistake in the maven repository for axis, as it shows entries for both axiom 1.2.6 and 1.2.7, and the spring-ws pom references 1.2.7. But I'm pretty sure axiom doesn't have a release 1.2.7 yet, because they just came out with 1.2.6.

Also, I wanted to make sure you knew that ByteArrayDataSource didn't come out until 1.2.6, so spring-ws will no longer work with axiom 1.2.5. I don't have any issues with that myself. I really wanted to use 1.2.6 as I had to do a few kind of ugly things due to some 1.2.5 limitations, so I'm happier seeing you use 1.2.6.

BTW - I've been meaning to comment on SWS-352 for the full streaming approach but haven't had the change to, but I'll try to soon.

Jim Cummings
added a comment - 14/May/08 2:28 PM Arjen - Your changes/improvements look nice and elegant. I see no problem with them at all.
I think for now using the ByteArrayOutputStream from the JDK is ok, and if anyone has perf issues, it could always be a future improvement to detect the commons.io version of ByteArrayOutputStream.
Two comments on the axiom version. I think there is a mistake in the maven repository for axis, as it shows entries for both axiom 1.2.6 and 1.2.7, and the spring-ws pom references 1.2.7. But I'm pretty sure axiom doesn't have a release 1.2.7 yet, because they just came out with 1.2.6.
Also, I wanted to make sure you knew that ByteArrayDataSource didn't come out until 1.2.6, so spring-ws will no longer work with axiom 1.2.5. I don't have any issues with that myself. I really wanted to use 1.2.6 as I had to do a few kind of ugly things due to some 1.2.5 limitations, so I'm happier seeing you use 1.2.6.
BTW - I've been meaning to comment on SWS-352 for the full streaming approach but haven't had the change to, but I'll try to soon.
Thanks for getting this fix in!!

Arjen Poutsma
added a comment - 14/May/08 8:01 PM I downgraded to 1.2.6, because I couldn't find an announcement for 1.2.7 either.
I am not so worried about requiring 1.2.6, because it is only applicable if you disable payload caching. For the default case, you can still use 1.2.5.
Thanks!

Erik-Berndt Scheper
added a comment - 15/May/08 6:46 PM Axiom 1.2.7 contains a fix for Axis2 1.4 and is included in the Axis2 1.4 distribution. See http://markmail.org/message/brgnp5tb4vmq6ysu?q=axiom+dev+mail for more details.
Therefore it should be fine to use Axiom 1.2.7. (though it's a bit sloppy not to officially announce it on the ws-commons site).