Replaced timeouts with pings to check that client connection is alive. Removed the property ipc.client.timeout from the default Hadoop configuration. Removed the metric RpcOpsDiscardedOPsNum.

Description

Current RPC (really IPC) relies on client side timeouts to find "dead" sockets. I propose that we have a thread that once a minute (if the connection has been idle) writes a "ping" message to the socket. The client can detect a dead socket by the resulting error on the write, so no client side timeout is required. Also note that the ipc server does not need to respond to the ping, just discard it.

Activity

Owen, could you please clarify this. So, the way things work is that if a client connection has been left idle too long (controlled via ipc.client.connection.maxidletime), the client closes the connection. This is to handle scalability issues to do with connection caching (as discussed in HADOOP-312). What you are proposing will effectively enable connection caching. Did I get you right? (BTW, for the task->tasktracker communication, the idle timeout is set to a very high number since the tasktracker server has to deal with only a very few number of clients).

Devaraj Das
added a comment - 12/Nov/07 11:45 Owen, could you please clarify this. So, the way things work is that if a client connection has been left idle too long (controlled via ipc.client.connection.maxidletime), the client closes the connection. This is to handle scalability issues to do with connection caching (as discussed in HADOOP-312 ). What you are proposing will effectively enable connection caching. Did I get you right? (BTW, for the task->tasktracker communication, the idle timeout is set to a very high number since the tasktracker server has to deal with only a very few number of clients).

Owen O'Malley
added a comment - 12/Nov/07 15:30 No, the ping will only be sent if there are outstanding rpc calls, not when the connection is idle. It is still right to close the connection if it has been idle too long.

It changes:
1. The call dropping code in the server is removed.
2. The client timeout code is removed.
3. The client writes an ipc with length = -1 once a minute if there are rpcs using the connection.
4. The server ignores messages with length = -1
5. A few Iterators have been typed and interruptedexceptions not ignored.
6. Added a new closing state for ipc connections.
7. Explicitly kill the active rpcs when a connection is broken.

Owen O'Malley
added a comment - 19/Nov/07 07:47 This patch hasn't been tested at scale, but it passes unit tests.
It changes:
1. The call dropping code in the server is removed.
2. The client timeout code is removed.
3. The client writes an ipc with length = -1 once a minute if there are rpcs using the connection.
4. The server ignores messages with length = -1
5. A few Iterators have been typed and interruptedexceptions not ignored.
6. Added a new closing state for ipc connections.
7. Explicitly kill the active rpcs when a connection is broken.

Raghu Angadi
added a comment - 19/Nov/07 19:06 Owen, could describe the policy on Server with the patch? (I mean what and how it fixes...)
for e.g. :
MAX_QUEUE_LENGTH is removed.. but not what stops server from reading too many (no need to read from clients if there are enough RPCs already pending).

It would be better if the server stopped reading the incoming queue, but I believe it is not critical for a first pass. The reason is that without the client side timeouts, the size of the queue is bounded by the number of client threads calling the server. So even if you have 2k servers running 4 maps each with 5 threads each, that is still only 40k pending calls, which won't be enough to blow out memory.

Owen O'Malley
added a comment - 19/Nov/07 19:38 It would be better if the server stopped reading the incoming queue, but I believe it is not critical for a first pass. The reason is that without the client side timeouts, the size of the queue is bounded by the number of client threads calling the server. So even if you have 2k servers running 4 maps each with 5 threads each, that is still only 40k pending calls, which won't be enough to blow out memory.

Doug Cutting
added a comment - 19/Nov/07 21:56 The approach looks good. The iterator typing adds noise to the patch.
I'm confused by the 'out.close()' call in sendPing(). Is that intended?
Perhaps the ping interval should be configurable (undocumented) and a test case could be added that sets a short ping time, shuts down a server, and sees that the client fails promptly?

Raghu Angadi
added a comment - 26/Nov/07 22:34 > 7. Explicitly kill the active rpcs when a connection is broken.
Owen, I could not see where this is done in the patch. Could you point to the file that has this change?
Also, client used to wait inside waitForWor() if there are no pending RPCs, but now it waits on read() from socket. Is that intended?

Owen O'Malley
added a comment - 26/Nov/07 22:58 Thanks for the catch, Doug. The close should have been a flush. And you are right that I probably should make a hidden switch to control the ping rate.
Raghu, the rpcs are killed at the top of Client.Connection.close.
I'll think about the difference between waiting in the waitForWork versus the readint. smile

Does the server side continue to have a timeout?
Seems like it should, badly behaved clients could cause a server brownout by sending RPCs, never reading the responses and so occupying all the handlers.

Sameer Paranjpye
added a comment - 26/Nov/07 23:35 Does the server side continue to have a timeout?
Seems like it should, badly behaved clients could cause a server brownout by sending RPCs, never reading the responses and so occupying all the handlers.

> the rpcs are killed at the top of Client.Connection.close.
hmm.. client side does not help the server side. One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server patiently executes all of them only to realize that client side has closed the connection for many of them, this in turn slows down the other clients, they will close the connections. I think we need Sever to avoid processing RPCs from connections which are closed already... each handler could check if the connection has a error.

Raghu Angadi
added a comment - 27/Nov/07 00:51 > the rpcs are killed at the top of Client.Connection.close.
hmm.. client side does not help the server side. One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server patiently executes all of them only to realize that client side has closed the connection for many of them, this in turn slows down the other clients, they will close the connections. I think we need Sever to avoid processing RPCs from connections which are closed already... each handler could check if the connection has a error.

> One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server...
with this patch client is not going to close the connection.. so the server problem may not as bad. That depends on how we handle the individual RPC timeouts

What about the timeout for RPCs (set by users) as opposed to timeout for connection. There is no use of Server excuting RPC that are timedout on client side.

Raghu Angadi
added a comment - 27/Nov/07 01:39 > One big issue at the server when it is loaded is that we will have a lot of RPCs waiting and server...
with this patch client is not going to close the connection.. so the server problem may not as bad. That depends on how we handle the individual RPC timeouts
What about the timeout for RPCs (set by users) as opposed to timeout for connection. There is no use of Server excuting RPC that are timedout on client side.

Owen's proposal looks good. Here are a few things I plan to do at the client side:
1. Remove client.call.timeout from Configuration;
2. Connection thread waits for response with a timeout of 1 minute.
3. If it does not receive a response, it pings the server to detect server side failure. If failure is detected, mark all pending calls as done.
4. Remove ConnectionCuller thread. Instead Connection thread detects idle connection itself in waitForWork.
5. Application thread does not timeout checking for results. Instead it waits until a call is done;

Hairong Kuang
added a comment - 08/Feb/08 19:03 Owen's proposal looks good. Here are a few things I plan to do at the client side:
1. Remove client.call.timeout from Configuration;
2. Connection thread waits for response with a timeout of 1 minute.
3. If it does not receive a response, it pings the server to detect server side failure. If failure is detected, mark all pending calls as done.
4. Remove ConnectionCuller thread. Instead Connection thread detects idle connection itself in waitForWork.
5. Application thread does not timeout checking for results. Instead it waits until a call is done;

In addition to the changes I proposed above, it has the following changes:
1. remove the field inUse in Client.Connection. In stead use calls.size().
2. Start the connection thread after I/O streams are set up.
3. Remove call.setResult, instead use setValuse and setException.

This patch is posted for review. It passed junit tests. I will perform more extensive large scale tests.

Hairong Kuang
added a comment - 15/Feb/08 19:35 In addition to the changes I proposed above, it has the following changes:
1. remove the field inUse in Client.Connection. In stead use calls.size().
2. Start the connection thread after I/O streams are set up.
3. Remove call.setResult, instead use setValuse and setException.
This patch is posted for review. It passed junit tests. I will perform more extensive large scale tests.

ipc.ping.interval should be set with a static accessor method that uses a constant to name the config property, not accessed directly with a literal string. Eventually we should replace all literal string configuration accesses with accessor methods, but in the meantime we should use accessor methods for new parameters and when changing existing parameters.

go ahead and remove maxQueueSize if it is unused. if its needed later it can be re-added.

Doug Cutting
added a comment - 12/Mar/08 19:24 minor nits:
ipc.ping.interval should be set with a static accessor method that uses a constant to name the config property, not accessed directly with a literal string. Eventually we should replace all literal string configuration accesses with accessor methods, but in the meantime we should use accessor methods for new parameters and when changing existing parameters.
go ahead and remove maxQueueSize if it is unused. if its needed later it can be re-added.
should we log pings at debug level?
we should use a constant instead of a literal -1 to identify a ping
out.write(-1) – shouldn't that be writeInt()?

Raghu Angadi
added a comment - 17/Mar/08 22:44 I am looking at the patch too.. though still not done. Couple of queries :
Why does the patch remove purging the connections on server if server is not able to write to the socket?
minor : in one place have '{{do
{ wait(); }
while (!cond);}}' its unconventional in the sense, normally the right thing to do is to check for condition before waiting.

receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do.

As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.

Raghu Angadi
added a comment - 19/Mar/08 00:59
receiveResponse() modifies calls without synchronizing on ' this ', but other accesses to it do.
As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.

> Why does the patch remove purging the connections on server?
The purpose of this jira is not to throw away any calls. Otherwise the applications will wait for the result for ever. I do not think we should concern about the number of pending responses so much because hadoop-2910 throttle the client write rate and in general a client's read rate should not be slower than the write rate. If this is really a concern, what we can do is to limit the total number of calls in the server including unserved ones, being served ones, and pending responding ones. HADOOP-2910 only limits the unserved calls.

Hairong Kuang
added a comment - 19/Mar/08 17:20 Raghu, thanks for the review comments.
> Why does the patch remove purging the connections on server?
The purpose of this jira is not to throw away any calls. Otherwise the applications will wait for the result for ever. I do not think we should concern about the number of pending responses so much because hadoop-2910 throttle the client write rate and in general a client's read rate should not be slower than the write rate. If this is really a concern, what we can do is to limit the total number of calls in the server including unserved ones, being served ones, and pending responding ones. HADOOP-2910 only limits the unserved calls.

> receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do.
Calls is of type HashTable, which is a synchronized data structure. There is no need to hold the connection lock.

Hairong Kuang
added a comment - 19/Mar/08 17:31 > receiveResponse() modifies calls without synchronizing on 'this', but other accesses to it do.
Calls is of type HashTable, which is a synchronized data structure. There is no need to hold the connection lock.

True. But what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.

> Otherwise the applications will wait for the result for ever.
In that case, we could close the connection while purging, we already do that in some cases while purging.

In any case purging is usually not required and almost no calls are purged while writing in practice.. we won't notice its absence until something bad happens.

+0 for removing it, from me. I don't think that should hold up the patch if you are ok with it.

Raghu Angadi
added a comment - 19/Mar/08 17:43 > The purpose of this jira is not to throw away any calls.
True. But what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.
> Otherwise the applications will wait for the result for ever.
In that case, we could close the connection while purging, we already do that in some cases while purging.
In any case purging is usually not required and almost no calls are purged while writing in practice.. we won't notice its absence until something bad happens.
+0 for removing it, from me. I don't think that should hold up the patch if you are ok with it.

> As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.

Since this patch removes SocketTimeoutException, it exposes quite a lot incorrect synchronizations in the code. Previously applications receive a SocketTimeoutException when a call is lost but now applications get stuck for ever. It took me quite a lot of energy to debug and sort out the synchronization part. Thank you for taking time to check its correctness.

> what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.

I think we should assume that clients uses IPC Client to talk to the IPC server, so no worry about their not reading from the sockets. In the case of 1000 clients per machine, if they all can send requests, why could not they read?

Hairong Kuang
added a comment - 19/Mar/08 18:30 > As you mentioned there is more synchronization involved. It is harder check the correctness with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently when isClosed is true.
Since this patch removes SocketTimeoutException, it exposes quite a lot incorrect synchronizations in the code. Previously applications receive a SocketTimeoutException when a call is lost but now applications get stuck for ever. It took me quite a lot of energy to debug and sort out the synchronization part. Thank you for taking time to check its correctness.
> what should server do if some clients just don't read from the sockets? I think purging exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case that happened is that one user accindentally started thousands of clients from one machine and these clients could not read.
I think we should assume that clients uses IPC Client to talk to the IPC server, so no worry about their not reading from the sockets. In the case of 1000 clients per machine, if they all can send requests, why could not they read?

This patch applies to the trunk. And one difference from the previous patch is that it keeps the purge code, which closes a connection if the connection has a pending response in the queue for more than 15 minutes.

Hairong Kuang
added a comment - 23/Apr/08 00:55 This patch applies to the trunk. And one difference from the previous patch is that it keeps the purge code, which closes a connection if the connection has a pending response in the queue for more than 15 minutes.

We could set {{call.timestamp only inside doRespond() when it is queue to the responder. This will avoid two calls to getTime() for each RPC in the normal case... not that it will really be noticeable.

Raghu Angadi
added a comment - 23/Apr/08 23:05 - edited +1. I looked only at the recent change.
We could set {{call.timestamp only inside doRespond() when it is queue to the responder. This will avoid two calls to getTime() for each RPC in the normal case... not that it will really be noticeable.

This patch additionally has the following changes:
1. In Server, a call's time stamp is updated when it is received and when its reponse is sent to the Responder.
2. Client.Connection.in is accessed by only the Connection thread, so its lock is no longer to be acquired. This eliminates a possible deadlock in the code.

Hairong Kuang
added a comment - 25/Apr/08 21:40 This patch additionally has the following changes:
1. In Server, a call's time stamp is updated when it is received and when its reponse is sent to the Responder.
2. Client.Connection.in is accessed by only the Connection thread, so its lock is no longer to be acquired. This eliminates a possible deadlock in the code.

Hairong Kuang
added a comment - 05/May/08 22:27 I just committed this.
The failure of TestTrash in the second to last hudson test seems not related to this issue. Multiple reruns of the unit test could not reproduce the failure.