Kernel Korner - Sleeping in the Kernel

The old sleep_on() function won't work reliably in an age of SMP systems and hyperthreaded processors. Here's how to make a process sleep in a safe, cross-platform way.

Thundering Herd Problem

Another classical operating system problem arises due
to the use of the wake_up_all function. Let us consider
a scenario in which a set of processes are sleeping on
a wait queue, wanting to acquire a lock.

Once the process that has acquired the lock
is done with it, it releases the lock and wakes
up all the processes sleeping on the wait queue. All
the processes try to grab the lock. Eventually,
only one of these acquires the lock and the rest
go back to sleep.

This behavior is not good for performance. If we
already know that only one process is
going to resume while the rest of the processes
go back to sleep again, why wake them up
in the first place? It consumes valuable CPU cycles
and incurs context-switching overheads. This
problem is called the thundering herd problem.
That is why using the wake_up_all function
should be done carefully, only when you know that it
is required. Otherwise, go ahead and use the wake_up
function that wakes up only one process at a time.

So, when would the wake_up_all function be used? It
is used in scenarios when processes want to take a
shared lock on something. For example, processes
waiting to read data on a page could all be woken
up at the same moment.

Time-Bound Sleep

You frequently may want to delay the execution of
your process for a given amount of time. It may be
required to allow the hardware to catch up or to
carry out an activity after specified time intervals, such
as polling a device, flushing data to disk or retransmitting
a network request. This can be achieved by the function
schedule_timeout(timeout), a variant of schedule(). This
function puts the process to sleep until timeout jiffies
have elapsed. jiffies is a kernel variable that is
incremented for every timer interrupt.

As with schedule(), the state of the process has to
be changed to TASK_INTERRUPTIBLE/TASK_UNINTERRUPTIBLE before calling this function. If
the process is woken up earlier than timeout
jiffies have elapsed, the number of jiffies left is
returned; otherwise, zero is returned.

Let us take a look at a real-life example
(linux-2.6.11/arch/i386/kernel/apm.c: 1415):

This code belongs to the APM thread. The thread
polls the APM BIOS for events at intervals of
APM_CHECK_TIMEOUT jiffies. As can be seen from the
code, the thread calls schedule_timeout() to sleep
for the given duration of time, after which it calls
apm_event_handler() to process any events.

You also may use a more convenient API, with which
you can specify time in milliseconds and seconds:

msleep(time_in_msec);

msleep_interruptible(time_in_msec);

ssleep(time_in_sec);

msleep(time_in_msec); and msleep_interruptible(time_in_msec);
accept the time to sleep in milliseconds, while ssleep(time_in_sec); accepts the time to
sleep in seconds. These higher-level routines
internally convert the time into jiffies,
appropriately change the state of the process and
call schedule_timeout(), thus making the process
sleep.

I hope that you now have a basic understanding of
how processes safely can sleep and wake up in the
kernel. To understand the internal working of wait
queues and advanced uses, look at the implementations
of init_waitqueue_head, as well as variants of wait_event
and wake_up.

Acknowledgement

Greg Kroah-Hartman reviewed a draft
of this article and contributed valuable suggestions.

Kedar Sovani (www.geocities.com/kedarsovani) works for Kernel
Corporation as a kernel developer. His areas of interest include
security, filesystems and distributed systems.

I see a 'lost wakeup' even in latest wait_event_interruptible. I know race condition got solved after sleep_on. So, I just want to know how/why this works w/o any problem. I saw a similar question but no answers (code snippet from 2.6.27 below)

thanks,
Shankar.

----

a) Wakeup occurs immediately before the call to prepare_to_wait()

b) Call to prepare_to_wait() sets process state to TASK_INTERRUPTIBLE

c) While the prepare_to_wait exits but just before condition is evaluated again, a h/w interrupt comes in!

d) The process is now still marked as TASK_INTERRUPTIBLE and will therefore not be

re-scheduled and will never execute the call to finish_wait() - so it will sleep

I am using wait_for_interruptable in
read function and the isr wakes it up using wake_up_interruptable.
Some times wait_for_interruptable returns error ERESTARTSYS.

The application usage scenario is as follows: I have a process which has lots of threads. And one thread has read() function. Sometimes wake_up_interruptable() function return error and the full user-application is killed. But the system contines to run with out any issues.

There are two possibilities this problem can occur:
1) There are many threads in my process, some thread has caused some problem which results in killing the process. So the process sends signal to the thread which is doing read and the read thread which is waiting on wake_up_interruptable() comes out with error. One more observation here is that the "release" function of my driver is called when this happens.
2) Second possibility is that - Some thing wrong is happening in the driver and it is giving the error for wait_for_interruptable() and this inturn kills the userspace process. I am not very sure about this. But can this kind of thing happen?

Can we use wait queue in bottom halve context?
For example, we have LOC in the following source
Kernel Version: Linux-2.6.18.5
Path: \net\xfrm\xfrm_policy.c
Ln: 919
Which calls schedule() function.
This function is called when packet is received and looks for
policy in softirq context.
Please correct me if i misunderstood.
Thanks in advance

The function xfrm_lookup() is called from both user context
and kernel(softirq) context.
The process is put to sleep when arguemnt:flag of the above
function is -EAGAIN.
User Context:
If this function is called from user context it may have -EAGAIN
Kernel Context:
If this function is called from kernel context (softirq), it will be NULL.
So the softirq context process will not be put to wait queue.

how will wait_event_interruptible will behave if SIGKILL happens to the user space process that is sleeping using this function? My system gives kernel panic when i do 'kill -SIGKILL pid'. is it because of the reason that im sleeping in kernel and i delivered SIGKILL?

There is an error in the code samples. wait_event() and wait_event_interuptible() should not be passed the address of my_event, but my_event itself. That is because they are macros, and their implementations will wind up using the address-of operator (&) to take the address of the parameter they are passed.

That is because they are macros, and their implementations will wind up using the address-of operator (&) to take the address of the parameter they are passed.mırç mırç Chat chat türkçe mirc türkçe mirc

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.