Comments

If a socket is closed it remains in TIME_WAIT state for some time. On operating
systems using BSD sockets the endpoint of the socket may not be reused while in
this state unless SO_REUSEADDR was set on the socket. On windows on the other
hand the default behaviour is to allow reuse (i.e. identical to SO_REUSEADDR on
other operating systems) and setting SO_REUSEADDR on a socket allows it to be
bound to a endpoint even if the endpoint is already used by another socket
independently of the other sockets state. This can even result in undefined
behaviour.
Many sockets used by QEMU should not block the use of their endpoint after being
closed while they are still in TIME_WAIT state. Currently QEMU sets SO_REUSEADDR
for such sockets, which can lead to problems on Windows. This patch introduces
the function socket_set_fast_reuse that should be used instead of setting
SO_REUSEADDR and does the right thing on all operating systems.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
---
include/qemu/sockets.h | 1 +
util/oslib-posix.c | 14 ++++++++++++++
util/oslib-win32.c | 10 ++++++++++
3 files changed, 25 insertions(+)

On 09/10/2013 07:26 AM, Sebastian Ottlik wrote:
> If a socket is closed it remains in TIME_WAIT state for some time. On operating> systems using BSD sockets the endpoint of the socket may not be reused while in> this state unless SO_REUSEADDR was set on the socket. On windows on the other> hand the default behaviour is to allow reuse (i.e. identical to SO_REUSEADDR on> other operating systems) and setting SO_REUSEADDR on a socket allows it to be> bound to a endpoint even if the endpoint is already used by another socket> independently of the other sockets state. This can even result in undefined> behaviour.> > Many sockets used by QEMU should not block the use of their endpoint after being> closed while they are still in TIME_WAIT state. Currently QEMU sets SO_REUSEADDR> for such sockets, which can lead to problems on Windows. This patch introduces> the function socket_set_fast_reuse that should be used instead of setting> SO_REUSEADDR and does the right thing on all operating systems.> > Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>> ---
> +int socket_set_fast_reuse(int fd)> +{> + int val = 1, ret;> +> + ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,> + (const char *)&val, sizeof(val));> +> + if (ret < 0) {> + perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");> + }
This would be the first use of perror in this file; I'm not sure if that
is the right function, or if there is a better thing to be using (in
fact, returning -1 and letting the client decide whether to issue a
warning may even be better).

On 10.09.2013 17:56, Eric Blake wrote:
> On 09/10/2013 07:26 AM, Sebastian Ottlik wrote:>> If a socket is closed it remains in TIME_WAIT state for some time. On operating>> systems using BSD sockets the endpoint of the socket may not be reused while in>> this state unless SO_REUSEADDR was set on the socket. On windows on the other>> hand the default behaviour is to allow reuse (i.e. identical to SO_REUSEADDR on>> other operating systems) and setting SO_REUSEADDR on a socket allows it to be>> bound to a endpoint even if the endpoint is already used by another socket>> independently of the other sockets state. This can even result in undefined>> behaviour.>>>> Many sockets used by QEMU should not block the use of their endpoint after being>> closed while they are still in TIME_WAIT state. Currently QEMU sets SO_REUSEADDR>> for such sockets, which can lead to problems on Windows. This patch introduces>> the function socket_set_fast_reuse that should be used instead of setting>> SO_REUSEADDR and does the right thing on all operating systems.>>>> Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>>> --->> +int socket_set_fast_reuse(int fd)>> +{>> + int val = 1, ret;>> +>> + ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,>> + (const char *)&val, sizeof(val));>> +>> + if (ret < 0) {>> + perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");>> + }> This would be the first use of perror in this file; I'm not sure if that> is the right function, or if there is a better thing to be using (in> fact, returning -1 and letting the client decide whether to issue a> warning may even be better).>
When I started writing the patch I was going to return the error and lat
the client handle the issue. But the code in net/socket.c then becomes:
ret = socket_set_fast_reuse(fd);
if (ret < 0) {
perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
closesocket(fd);
return -1;
}
Which looked unclean to me, as the code implies assumptions about the
implementation of socket_set_fast_reuse. One could also call
perror("socket_set_fast_reuse()") but this would break the convention in
the surrounding code of passing for the function that failed to perror.
As both approaches were not so great, I moved the error message to
socket_set_fast_reuse and accepted the side effect, that the other
places output an error message if something goes wrong. I agree I should
have been mentioned this change in the commit messages. Also it is
unlikely the function will fail during normal use of QEMU.
Another approach would be to indeed let the client decide what to do
with the error and use other error reporting facilities. But I am not
sure what would be appropriate and how to handle errno in this case,
which could provide some useful insights.

On 09/10/2013 10:23 AM, Sebastian Ottlik wrote:
>>> + if (ret < 0) {>>> + perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");>>> + }>> This would be the first use of perror in this file; I'm not sure if that>> is the right function, or if there is a better thing to be using (in>> fact, returning -1 and letting the client decide whether to issue a>> warning may even be better).>>> When I started writing the patch I was going to return the error and lat> the client handle the issue. But the code in net/socket.c then becomes:> > ret = socket_set_fast_reuse(fd);> if (ret < 0) {> perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");> closesocket(fd);> return -1;> }> > Which looked unclean to me, as the code implies assumptions about the> implementation of socket_set_fast_reuse. One could also call> perror("socket_set_fast_reuse()") but this would break the convention in> the surrounding code of passing for the function that failed to perror.
Maybe a compromise? Add a 'bool silent' flag to socket_set_fast_reuse,
and only issue perror() if the flag is false. Existing callers that
don't care about failure (if we get fast reuse, great; if not, no huge
loss) pass false, existing callers that did their own error reporting
pass true to take advantage of the perror() on failure, and then you
aren't changing semantics at call sites.
But I'm just making this observation from the side; you might want to
get an opinion from an actual maintainer of this area of code on which
approach is best.

On 10.09.2013 18:34, Eric Blake wrote:
> On 09/10/2013 10:23 AM, Sebastian Ottlik wrote:>>>>> + if (ret < 0) {>>>> + perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");>>>> + }>>> This would be the first use of perror in this file; I'm not sure if that>>> is the right function, or if there is a better thing to be using (in>>> fact, returning -1 and letting the client decide whether to issue a>>> warning may even be better).>>>>> When I started writing the patch I was going to return the error and lat>> the client handle the issue. But the code in net/socket.c then becomes:>>>> ret = socket_set_fast_reuse(fd);>> if (ret < 0) {>> perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");>> closesocket(fd);>> return -1;>> }>>>> Which looked unclean to me, as the code implies assumptions about the>> implementation of socket_set_fast_reuse. One could also call>> perror("socket_set_fast_reuse()") but this would break the convention in>> the surrounding code of passing for the function that failed to perror.> Maybe a compromise? Add a 'bool silent' flag to socket_set_fast_reuse,> and only issue perror() if the flag is false. Existing callers that> don't care about failure (if we get fast reuse, great; if not, no huge> loss) pass false, existing callers that did their own error reporting> pass true to take advantage of the perror() on failure, and then you> aren't changing semantics at call sites.>> But I'm just making this observation from the side; you might want to> get an opinion from an actual maintainer of this area of code on which> approach is best.>
This is probably the least intrusive approach, which is probably best
without further maintainer input. I will wait and see if someone responds.