You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux - A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.

I've posted this problem in the centos forum at www.centos.org, but I thought I would solicit input from the greater Linux community who might have noted this problem and who don't commonly visit the centos forum.

This problem has been noted with UDP sockets. We're not sure if it also happens with TCP sockets.

Occasionally, when a non-blocking UDP socket is polled using the select() function with a zeroed timeval structure, we note that the select() stalls for just over 70 minutes. We wish to respond quickly when packets appear spontaneously on this socket, but the opposite socket very, very rarely spontaneously transmits a packet. It is common for no packet to be spontaneously transmitted to this socket for many hours.

We find it quite coincidental that 0xFFFFFFFF in usec resolution equals 71 minutes, 35 seconds. We hypothesize that the usec component of the zeroed timeval structure provided to select() is occasionally being decremented to 0xFFFFFFFF (or the equivalent in "jiffies") prior to the OS testing if it is equal to zero. Thus, we incur a 71 minute, 35 second timeout. We poll this socket at quite a high rate (e.g. 50 Hz) and this problem might occur once or twice over 12 hours. It is apparently quite sensitive to precisely when the select() function is called in relation to the whatever clocks drive the OS to decrement socket timeouts.

We have searched the RedHat bug list, the centos forum, and this site and have not found any similar complaints using select() with a zeroed timeout. Has anyone else observed this behavior? Is there a remedy that entails something other than avoiding zero timeouts or a watchdog on threads that might perform zero timeout select() calls? Our product also employs a library that may perform zero timeout select() calls, so we'd prefer an OS level solution. We didn't notice anything in the centos v5.3 release notes to indicate that such a problem has been recognized and addressed.

I am not an OS level programmer, so I don't have a good feel for whether this problem is due to a unique interaction of v5.2 centos and our Aberdeen peculiar server hardware. If it isn't peculiar to our hardware, I'd have thought there would already be plenty of posts about this issue on-line.

Despite the vast number of Linux installations, I suppose it's possible a problem such as this might go unnoticed for an extended period of time. It manifests very infrequently given the number of opportunities. And one might only recognize it happens if the socket they are polling using select() with a zeroed timeout only very, very rarely receives packet traffic. Otherwise, the select() would return due to the reception of that traffic.