A question on TCP SO_KEEPALIVE / WSAECONNRESET

Hi,
I need to detect if a TCP connection is broken. At first, I tried to set SO_KEEPALIVE. However, there is not parameter to set the timeout. I know that in registry I can set the timeout (which default is 10 min for Windows), but this will affect all the TCP/IP connection.

I would like to take a second alternative: now and then call send() to transmit some dummy packet to another end. I expect the WSAECONNRESET error returned, which it does not, and after a few 10 secs, the select() returned error. I run this test in a intranet, so I am not sure whether is it detected by Microsoft Network or TCP.

When a client breaks a connection, select() returns with that socket descriptor set as "ready to read". When you actually do recv() from it, recv() will return 0. That's how you know the client has closed the connection.

That is right, it cannot detect Network link broken. For such a purpose, the general way it is handled is to set a timeout value for the socket(data not received within 5 min.) and when the socket times out, close the socket. Or you can send a packet after the timeout value, and if you get an error, you can close the socket.

I cannot use the timeout, because the data will randomly arrive (may be after a few mins).

About send(), this is the question I have. I still can call send() without any error return ( I think it is buffer), and after a few mins, then my select() thread will report error. I would like to know whether this is detected by Microsoft Network or TCP. I worry about this because the application has to ve able to run in internet, which current developed and tested in Intranet.

The only was, as I see to tackle this problem is debugging with a prototype code. Write a small code that simulates your actual code, and try checking if you get the same result. Simulate the situation in the code.
And what is this Microsoft Network? I think the error is not detected by the MS Network. And what is the timeout value you have set in the select() statement? Set it to NULL.

0

leCommented: 2001-08-20

True. KEEP_ALIVE does not have a timeout parameter exposed to the caller. There is a way to configure timeout and number of retries in WIN registry. Look in MSDN. The real problem is: TCP/IP does not guaranty delivery.
First of all: recv checks if there is data in the driver buffer avaliable and returnes to the caller OK. So, network may be down (cable disconnected) and recv returnes OK - no data avaliable. Send returnes OK if data is placed in the local buffer. In short, if number of bytes written <> number of bytes sent in syncronous mode you know that connection is lost. If all was written OK, you know ONLY that data is in the buffer.

KEEP_ALIVE default timeouts are set extrimely high (in essance, disabling keep alive functionality). Again you can change it by edditing registry.
The situation is even worse: keep alive functionality is VERY inconsistent accross different platforms. The best implementation is in WINDOWS NT. Windows 98 failes. Linix also fails unpredictably.

The right solution is to add keep alive sygnals to your application level protocol.

There is one more way to guaranty message delivery (all documentation strongly discorage this usage).
Namely disabling buffering at the end of every send operation:
int BuffSize = 0;
setsockopt(m_socket, SOL_SOCKET, SO_SNDBUF, (char*)(&BuffSize), sizeof(BuffSize));
You start writing with normal Buffer size and write all bytes (but one) in one call to send. Then change the option and sent the last byte with no buffering. With this option send returnes after the round trip is done and data is in receiving copmuter buffer (NOTE: not read by receivind application, just in the buffer).
The connection speed in this case will be 1/10 of your bandwidth.

I agree that what you said, but I would like to "temporary" unlock this question first to see whether I cam receive more comments.

I know that application level protocol can solve this. Unfortunately, I am using a standard protocol (ITU-H225) and not all of the vendors repsonse to the network status reqeust.

Currently what I am doing (which I am not satisfy yet) is:
create a separate thread, which now and then try to open a TCP connection to the destination. If network broken, I will receive error from connect(). I don't like it because it will consume another connection.

The only thing I am confused is when I call send(), then after a while it may cause select() returned from error. I would like to know whether this is caused by Ethernet network or TCP/IP.

I will not try to use the buffer approach, becuase of performance reason.