Description

----------
A configured XmlRpcClient object is thread safe: In other words, the suggested use is, that you configure the client using setTransportFactory(XmlRpcTransportFactory) and similar methods, store it in a field and never modify it again. Without modifications, the client may be used for an arbitrary number of concurrent requests.
----------

I have a simple test case that creates two threads that share a single XmlRpcClient instance, and if I run the test I see the following errors:

----------
Fatal Error] :1:1: Content is not allowed in prolog.
operation failed, Failed to parse server's response: Content is not allowed in prolog.[Fatal Error] :1:10: Element type "xlvrsion" must be followed by either attribute specifications, ">" or "/>".
operation failed, Failed to parse server's response: Element type "xlvrsion" must be followed by either attribute specifications, ">" or "/>".
----------

----------[Fatal Error] :1:1: Content is not allowed in prolog.[Fatal Error] :1:5: Element type "xlv" must be followed by either attribute specifications, ">" or "/>".
operation failed, Failed to parse server's response: Element type "xlv" must be followed by either attribute specifications, ">" or "/>".
operation failed, Failed to parse server's response: Content is not allowed in prolog.
----------

If I wrap the call to XmlRpcClient.execute with a synchronized block that locks on the XmlRpcClient object, the code works fine.

It looks like the execute code is not locking around the read/write to/from the server, so the threads are competing to read the response from the server, and are tripping over each other.

Jochen Wiedmann
added a comment - 10/May/09 15:44 Alan, give yourself the answer:
Suggest for a moment that we had a transport shared between threads and sendRequest were thread safe. Furthermore, suggest that we had a request running for 10 seconds.
As a consequence, no other request could run within these 10 seconds, at least not with this very transport. Do you think that's desirable?

Jochen. that contradicts your statement that "the transport was never meant to be thread safe and will never be thread safe". There is no way a transport factory can return an unsynchronized singleton and still be thread-safe.

I can either add the synchronization or fix the comment, but one or the other needs to be done. Which would you prefer?

Alan Burlison
added a comment - 10/May/09 14:56 - edited I knew I'd written my transport factory to return a singleton transport for a reason, this is from XmlRpcTransportFactory:
----------
/** Returns an instance of
{@link XmlRpcTransport}
. This may
be a singleton, but the caller should not depend on that:
A new instance may as well be created for any request.
@return The configured transport.
*/
public XmlRpcTransport getTransport();
----------
Jochen. that contradicts your statement that "the transport was never meant to be thread safe and will never be thread safe". There is no way a transport factory can return an unsynchronized singleton and still be thread-safe.
I can either add the synchronization or fix the comment, but one or the other needs to be done. Which would you prefer?

That's fine, that's exactly what I am suggesting. Only the synchronisation of sendRequest is directly related to the race problem, the rest of the patch is just general cleanup and typo fixes. I can move that stuff into another bug if you wish.

And I still think it is worth adding a note somewhere to the effect that the transport factories need to create a new transport for each request - the comment in XmlRpcClient misled me into believing that they didn't have to. I can submit a patch for that too if you like.

Alan Burlison
added a comment - 08/May/09 15:06 That's fine, that's exactly what I am suggesting. Only the synchronisation of sendRequest is directly related to the race problem, the rest of the patch is just general cleanup and typo fixes. I can move that stuff into another bug if you wish.
And I still think it is worth adding a note somewhere to the effect that the transport factories need to create a new transport for each request - the comment in XmlRpcClient misled me into believing that they didn't have to. I can submit a patch for that too if you like.

By design, the documented anchor point for thread safety is the XmlRpcClient. Creating an XmlRpcTransport is a rather cheap operation.

Looking through your patch, I have no problem with explicitly disabling caches, or explicitly stating to use in- and output. If that makes the class currently thread safe, fine with me. But I'd never promise to keep it that way.

Jochen Wiedmann
added a comment - 08/May/09 14:53 Sure, it could, but what for?
By design, the documented anchor point for thread safety is the XmlRpcClient. Creating an XmlRpcTransport is a rather cheap operation.
Looking through your patch, I have no problem with explicitly disabling caches, or explicitly stating to use in- and output. If that makes the class currently thread safe, fine with me. But I'd never promise to keep it that way.

Well it could be made thread safe very easily, at least in the case of the Sun transport. And if not, at very least the documentation needs changing, as saying that XmlRpcClient is thread-safe rather misleading, as it depends on the transport factory and transport it is configured to use.

In my case I assume the correct thing to do is to modify my transport factory so that it returns a new instance of the transport on each call to getTransport, rather than using a singleton as it does at present, is that correct?

With the exception of the addition of 'synchronized' to XmlRpcSunHttpTransport.sendRequest(), I think the rest of the patch can stand as it cleans up a few other issues, what do you think?

Alan Burlison
added a comment - 08/May/09 12:47 Well it could be made thread safe very easily, at least in the case of the Sun transport. And if not, at very least the documentation needs changing, as saying that XmlRpcClient is thread-safe rather misleading, as it depends on the transport factory and transport it is configured to use.
In my case I assume the correct thing to do is to modify my transport factory so that it returns a new instance of the transport on each call to getTransport, rather than using a singleton as it does at present, is that correct?
With the exception of the addition of 'synchronized' to XmlRpcSunHttpTransport.sendRequest(), I think the rest of the patch can stand as it cleans up a few other issues, what do you think?

I'll check out the 1.4/1.5 issue, I thought I was seeing my code go through the 1.5 variant when I was stepping through it in the debugger, but I'll double check.

The issue is that the object returned by getTransport() isn't thread-safe, so if you share that returned object between two threads they can both end up using the same underlying URLconnection at the same time, and that's what causes the problem. You can get into that situation in turn if you share the XmlRpcClient object that doles out the transports between two threads.

Alan Burlison
added a comment - 07/May/09 23:13 I'll check out the 1.4/1.5 issue, I thought I was seeing my code go through the 1.5 variant when I was stepping through it in the debugger, but I'll double check.
The issue is that the object returned by getTransport() isn't thread-safe, so if you share that returned object between two threads they can both end up using the same underlying URLconnection at the same time, and that's what causes the problem. You can get into that situation in turn if you share the XmlRpcClient object that doles out the transports between two threads.
It may be the case that I'm setting up the XML-RPC 'stack' incorrectly in the first place, but the docs do say that XmlRpcClient is thread-safe. The code that implements my client can be found at http://src.opensolaris.org/source/xref/website/auth/AuthClient/src/org/opensolaris/auth/client/AuthClient.java , it may be worth you taking a look at that to make sure I'm not violating some assumption that the XML-RPC library makes about how things are set up - it's entirely possible I am
As for my 'real' app, it is doing more realistic processing in both the client & the server (crypto stuff), so the window for the race condition to occur is wider and therefore easier to hit.

Alan, your changes would make perfect sense to me, if TransportFactory.getTransport() would return a singleton object, or in similar circumstances. (I notice, you are using the XmlRpcSunHttpTransportFactory, as opposed to XmlRpcSun14 or XmlRpcSun15. Is that intentional, btw?)

However, that's not the case: Any call to getTransport() returns a new object, with a new URLconnection. So, what's the point with synchronizing sendRequest(), as the synchronized object is per-thread? And, in particular, the URLConnection is not "shared", as you state above?

I'd also like to understand what's the difference between my scenario from XMLRPC-168 and what you are doing to reproduce the problem and your "real app"?

Jochen Wiedmann
added a comment - 07/May/09 20:42 Alan, your changes would make perfect sense to me, if TransportFactory.getTransport() would return a singleton object, or in similar circumstances. (I notice, you are using the XmlRpcSunHttpTransportFactory, as opposed to XmlRpcSun14 or XmlRpcSun15. Is that intentional, btw?)
However, that's not the case: Any call to getTransport() returns a new object, with a new URLconnection. So, what's the point with synchronizing sendRequest(), as the synchronized object is per-thread? And, in particular, the URLConnection is not "shared", as you state above?
I'd also like to understand what's the difference between my scenario from XMLRPC-168 and what you are doing to reproduce the problem and your "real app"?

The attached patch (XMLRPC-167.patch) fixes the bug and also cleans up a few other minor issues I found along the way. Making XmlRpcSunHttpTransport.sendRequest synchronized is the lowest point in the call stack that can be used to make the creation of the HttpURLConnection and its subsequent write & read operations atomic.

Alan Burlison
added a comment - 06/May/09 18:28 - edited The attached patch ( XMLRPC-167 .patch) fixes the bug and also cleans up a few other minor issues I found along the way. Making XmlRpcSunHttpTransport.sendRequest synchronized is the lowest point in the call stack that can be used to make the creation of the HttpURLConnection and its subsequent write & read operations atomic.

I've made XmlRpcStreamTransport.sendRequest synchronized as it is responsible for sending the request to the server and receiving the response. That fixes the concurrent read issue, but reveals another problem:

Note that unlike the previous case, the reads from the socket are not intermixed, and Thread-0 sees the correct response. Note however the following line and the subsequent stack trace:

Thread-1: java.lang.IllegalStateException: Already connected

Here's the issue: The underlying URLConnection has two states - opened and connected - see http://java.sun.com/javase/6/docs/api/java/net/URLConnection.html. You can only set headers when the UrlConnection is in the opened state, once it becomes connected you can't modify headers. When Thread-0 exits the synchronized sendRequest method, Thread-1 acquires the lock, but the shared UrlConnection is already in the connected state, so the attempt by Thread-1 to set headers fails.

Which takes us back to my original comment in the bug - XmlRpcClient just isn't thread-safe, despite what the docs say. You may get away with using it in multi-threaded apps using a transport other than the SunHttpTransport , you may get away with running it on a uniprocessor machine, you may get away with only running it under light load, but using the Sun transport on a MP machine under heavy load it is just plain broken.

I'll carry on investigating and see if I can figure out a fix, but I suspect it is going to require either a rewrite or some aggressively coarse-grained locking.

Alan Burlison
added a comment - 06/May/09 09:25 I've made XmlRpcStreamTransport.sendRequest synchronized as it is responsible for sending the request to the server and receiving the response. That fixes the concurrent read issue, but reveals another problem:
----------
Thread-0: InputStreamTracer(sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@eac5a)
Thread-0: read() = "<"
Thread-0: read() = "?"
Thread-0: read() = "x"
Thread-0: read() = "m"
Thread-0: read(buff, 0, 28) = 28 "l version="1.0" encoding="UT"
Thread-0: read() = "F"
Thread-0: read() = "-"
Thread-0: read() = "8"
Thread-0: read() = """
Thread-0: read() = "?"
Thread-0: read() = ">"
Thread-0: read(buff, 0, 8192) = 754 "<methodResponse><params><param><value><struct><member><name>name</name><value>Thread-0</value></member> ..."
Thread-0: read(buff, 0, 8192) = -1
Thread-0: close()
Thread-1: java.lang.IllegalStateException: Already connected
at sun.net.www.protocol.http.HttpURLConnection.setRequestProperty(HttpURLConnection.java:2235)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.setRequestProperty(HttpsURLConnectionImpl.java:296)
at org.apache.xmlrpc.client.XmlRpcSunHttpTransport.setRequestHeader(XmlRpcSunHttpTransport.java:73)
at org.apache.xmlrpc.client.XmlRpcHttpTransport.setContentLength(XmlRpcHttpTransport.java:118)
at org.apache.xmlrpc.client.XmlRpcHttpTransport.newReqWriter(XmlRpcHttpTransport.java:156)
at org.apache.xmlrpc.client.XmlRpcStreamTransport.sendRequest(XmlRpcStreamTransport.java:150)
at org.apache.xmlrpc.client.XmlRpcHttpTransport.sendRequest(XmlRpcHttpTransport.java:143)
at org.apache.xmlrpc.client.XmlRpcSunHttpTransport.sendRequest(XmlRpcSunHttpTransport.java:69)
at org.apache.xmlrpc.client.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:56)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:167)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:137)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:126)
at org.opensolaris.auth.client.AuthClient$XmlRpcInvocationHandler.invoke(AuthClient.java:258)
at $Proxy7.encryptCookie(Unknown Source)
at authclient.DL.run(Deadlock.java:84)
at java.lang.Thread.run(Thread.java:619)
Thread-0: InputStreamTracer(sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@1e13e07)
Thread-0: read() = "<"
Thread-0: read() = "?"
Thread-0: read() = "x"
Thread-0: read() = "m"
Thread-0: read(buff, 0, 28) = 28 "l version="1.0" encoding="UT"
Thread-0: read() = "F"
Thread-0: read() = "-"
Thread-0: read() = "8"
Thread-0: read() = """
Thread-0: read() = "?"
Thread-0: read() = ">"
Thread-0: read(buff, 0, 8192) = 949 "<methodResponse><params><param><value><struct><member><name>userId</name><value><i4>1</i4></value></member> ..."
Thread-0: read(buff, 0, 8192) = -1
Thread-0: close()
----------
Note that unlike the previous case, the reads from the socket are not intermixed, and Thread-0 sees the correct response. Note however the following line and the subsequent stack trace:
Thread-1: java.lang.IllegalStateException: Already connected
Here's the issue: The underlying URLConnection has two states - opened and connected - see http://java.sun.com/javase/6/docs/api/java/net/URLConnection.html . You can only set headers when the UrlConnection is in the opened state, once it becomes connected you can't modify headers. When Thread-0 exits the synchronized sendRequest method, Thread-1 acquires the lock, but the shared UrlConnection is already in the connected state, so the attempt by Thread-1 to set headers fails.
Which takes us back to my original comment in the bug - XmlRpcClient just isn't thread-safe, despite what the docs say. You may get away with using it in multi-threaded apps using a transport other than the SunHttpTransport , you may get away with running it on a uniprocessor machine, you may get away with only running it under light load, but using the Sun transport on a MP machine under heavy load it is just plain broken.
I'll carry on investigating and see if I can figure out a fix, but I suspect it is going to require either a rewrite or some aggressively coarse-grained locking.

I've wrapped the InputStream read by XmpRpcStreamTransport with a simple class that for each method call prints out the thread ID, the method name, arguments and return value to a specified output stream:

I couldn't replicate the problem using those test classes even before the fix for XMLRPC-168, but my real app showed it within a few seconds. I'll upgrade my XML-RPC libs to include the fix for XMLRPC-168 and start poking around tomorrow - I started looking at it today while you were very kindly looking at XMLRPC-168, but other things got in the way. I'll let you know what I find. Thanks!

Alan Burlison
added a comment - 29/Apr/09 22:16 I couldn't replicate the problem using those test classes even before the fix for XMLRPC-168 , but my real app showed it within a few seconds. I'll upgrade my XML-RPC libs to include the fix for XMLRPC-168 and start poking around tomorrow - I started looking at it today while you were very kindly looking at XMLRPC-168 , but other things got in the way. I'll let you know what I find. Thanks!

Alan, using the Client and Server from XMLRPC-168 (note: the updated version of the Client with 10 concurrent threads in parallel) I wasn't able to reproduce your problem. Consequently I doubt that the problem is in the XmlRpcClient or any related class.

Jochen Wiedmann
added a comment - 29/Apr/09 20:09 Alan, using the Client and Server from XMLRPC-168 (note: the updated version of the Client with 10 concurrent threads in parallel) I wasn't able to reproduce your problem. Consequently I doubt that the problem is in the XmlRpcClient or any related class.

I've looked at the server and it is seeing valid requests, and sending out responses. I've modified the client so that it just creates a single thread and then run two instances against the server, and that works fine - well, up until the server deadlocks inside org.apache.xmlrpc.util.ThreadPool, but that's the subject of a different bug...

The transport factor is a subclass of XmlRpcSunHttpTransport that overrides newURLConnection so that it uses a custom socket factory that creates SSL sockets, other than that it is completely standard.

Alan Burlison
added a comment - 28/Apr/09 11:26 I've looked at the server and it is seeing valid requests, and sending out responses. I've modified the client so that it just creates a single thread and then run two instances against the server, and that works fine - well, up until the server deadlocks inside org.apache.xmlrpc.util.ThreadPool, but that's the subject of a different bug...
The transport factor is a subclass of XmlRpcSunHttpTransport that overrides newURLConnection so that it uses a custom socket factory that creates SSL sockets, other than that it is completely standard.

Jochen Wiedmann
added a comment - 28/Apr/09 09:58 Alan, this isn't proof that the client has a problem: For exactly the same reason, the server might break.
In other words, you should repeat your example, tracing the servers output.
Apart from that, the important information is missing, what transport factory you are using: Obviously thread safety is actually a problem of the transport factory.