Due: October 18, 2000

In this lab, you will come to better understand Linux's scheduling
algorithm through "de-tuning" experiments. That is, you will modify
the scheduler to be worse than it normally is and measure the
undesirable effects of your changes. This should help you appreciate
the benefits conferred by the normal (unmodified) design. This lab
doesn't involve much programming, but requires non-trivial
understanding; don't be embarrassed to ask questions.

For this entire lab, we will concentrate exclusively on Linux's normal
scheduling algorithm, used for processes with the scheduling policy
SCHED_OTHER. (All normal processes use this policy.) We
will ignore the real-time scheduling policies SCHED_FIFO
and SCHED_RR.

The scheduler has two primary objectives:

When there are multiple CPU-bound processes, they should share the
processor in accordance with their priorities. For example, if there
are two normal-priority processes and one low-priority process, the
two normal-priority processes might each get 2/5 of the CPU's time and
the low-priority process 1/5. (Other ratios than 2:1 are also
possible.)

In this lab, we will focus on understanding the mechanisms that
achieve objective 2, making changes that impact the "responsiveness"
of the system, i.e., the extra latency (delay) from when a process
becomes runnable to when it runs. With respect to objective 1 (CPU
sharing), you will only be asked to show that the scheduler changes
have no significant impact on it.

To observe the scheduler's behavior, you will need two test programs
to run. One is just a do-nothing infinite loop, to provide a
CPU-bound process. The other is a loop that uses two operating system
calls. One is nanosleep, which asks the operating system
to put the current process to sleep for a specified number of seconds
and nanoseconds. You will use values large enough to sleep for
several seconds. This is done each time around the loop, with
essentially nothing else. Thus the process is mostly asleep (not
runnable), waking up (running) only very briefly every few seconds.
The other system call the program uses, gettimeofday,
does what it says. (You should be familiar with it from the prior
lab.) The program uses gettimeofday twice each time
around the loop - once before and once after the
nanosleep. By subtraction it computes the extra delay
(latency) beyond the requested sleep time. The program then prints
out this latency for you to see. The programs from the prior lab
should help you write the test programs you need for this lab, but if
you have trouble, please ask: this is not intended to be the focus of
the lab, so you shouldn't get stuck here. To read the on-line
documentation of the system calls, you can use a command like

man nanosleep

To run one of the programs at a specified "niceness" (priority) level,
you can use the system program called nice. (See the
on-line documentation.) For example, to run a program called
./runner at approximately half the normal priority, you
would do

nice -10 ./runner

Do some experiments as in the previous lab, using top, to
see how multiple CPU-bound processes share the CPU when they have
different priority levels (as well as when they have the same
priority). When you change the scheduler, verify that this CPU
sharing behavior remains essentially unchanged.

To experiment with responsiveness, you should first provide the
sleeping process with some competition for the CPU by running one or
more CPU-bound processes. You can choose how many and of what
priority - this is part of your experimental design, and should be
chosen to provoke the results you want. Now run one copy of the
repeatedly sleeping program (again, at a priority level of your
choice). Observe the latencies it encounters in waking up from its
sleeps. You should repeat this several times under "identical"
circumstances (same competing processes, same version of the
scheduler), since it may matter when in the scheduling cycle the
process starts.

Now modify the scheduler and, using the modified kernel, repeat the
experiments. See below for more on how to make a modified kernel. To
get full credit for this lab assignment, you need to do the following
- in either order:

Make a kernel in which the only change is that processes no longer
keep half of their unused counter when
counter values are recalculated. Demonstrate a
circumstance under which this makes a significant difference in
latency, and explain why.

Make a kernel that has the change described above and also some
other change, such that the sleeper's latencies are significantly
longer than under the normal kernel when contending with a number of
CPU-bound processes, even when all the CPU-bound processes have the
same niceness (priority) as the sleeper. Explain why the latency was
impacted.

You should also show that neither kernel change has much impact on how
CPU-bound processes share the CPU.

Lab facilities

Each team will have an older PC in the OHS 329 lab which will be for
the exclusive use of that team, and for which they will have the root
password. This will allow you to experiment with kernel modifications.
We've got a special CD-ROM and boot disks
which we can use to bring the machine back to a clean state; I'll show
you this. So long as you don't make too much trouble
for the rest of us, you can feel free to do whatever other experiments
you want on these machines.

I will issue each team four diskettes. One of these diskettes is
write protected, has "2.4.0-test7 orig. boot" written on it, and
contains the Linux 2.4.0-test7 kernel, suitable for booting any of our
course's PCs. (You put the floppy in the drive and turn the machine
on.) You should leave this diskette write-protected. A second
diskette for each team has written on it (and is) "MS-DOS formatted," and may
contain some uninteresting Windows setup files (from OmniTech), which
you can delete. You can put one of these disks in an already booted
Linux machine, or equally well in one of the normal lab machines, and
mount it using the command

mount /mnt/floppy

At this point, the diskette is available to you as the /mnt/floppy
directory. When you are done reading or writing it, before you
physically eject the diskette you should give the command

umount /mnt/floppy

You will need to mount this MS-DOS diskette in order to transport
files back and forth between our experimental machines and the normal
lab machines. You will need to do this in order to print files,
maintain safe backup copies, etc. (Don't treat the hard drive of your
experimental machine as a safe place to leave your work.) The
experimental machines are not networked. The final two diskettes for
each team are for you to make experimental, modified, kernel boot
disks as described below.

The scheduler - and modifying it

I've provided you with an annotated copy of the file
/usr/src/linux/kernel/sched.c, which is the scheduler.
(That annotated copy is linked here as PostScript and as text.)
Read it with an eye to the specific de-tuning objectives listed above.
Your changes should be very small: just a handful of characters. This
is mostly a read-and-understand lab (with experimental testing of your
understanding, possibly resulting in revised understanding).

Remember when reading the code that we are only concerned with
the scheduling policy SCHED_OTHER. Also, you can ignore
all code that is specifically for SMP (symmetric multi-processor)
systems. In the annotated version of the code, I've indicated how to
tell which code is for SMP systems.

Note that the normal networked lab machines are running an older
version of the Linux kernel than we are using on our experimental
machines. (The 2.4.0-test7 version is "bleeding edge.") Thus you
will need to do your editing starting from the sched.c that is on the
experimental machine, not the version on the normal networked
lab machines.

To rebuild the kernel after editing the sched.c file,
put a blank
disk in the floppy drive and, with your current directory being
/usr/src/linux, give the command

make bzdisk

To boot your new kernel, leave that disk in the drive and give the
command

reboot

Remember that kernel hacking is difficult, and debugging especially
so. So don't feel bad if your modifications don't work at first. Be
sure to consult with me.

Thoughts for your lab report

Some things you may want to do in your lab report are:

Explain the significance of relevant portions of the normal Linux
scheduler: how do they contribute to good responsiveness?

Explain the changes you made to the Linux scheduler. What changes did
you make? Why would these changes hurt responsiveness? Under what
circumstances?

Explain your experiments to measure responsiveness. What did you
measure? Under what circumstances?

Present the results of your experiment, for example in clear tables or
graphs.

Interpret your results. How do your measured data fit with your
understanding of how the variant schedulers would behave?