CS 372

2011

Project 2

A User-Level Thread Package

Due: Friday, Feb 11, 2011, 5:59:59 PM

1. Threads

1.1 Quick Overview

A key abstraction in operating systems is threads and processes for concurrency.
To gain a deeper understanding of how these abstractions are constructed,
this project asks you to build the core of a user-level threads package.
Building kernel-level processes and threads not would be much different,
but we'll do this project at user level since installing new kernels on
the instructional machines would be problematic.

Threads provide the illusion that different parts of your program are
executing concurrently. Through the years, a model of executing multithreaded
programs has emerged as a defacto standard. In this model, threads share
the code, heap, and the runtime system. Each thread, however, has a separate
stack and, naturally, a separate set of CPU registers. This programming
model also provides synchronization primitives so that different threads
can coordinate access to shared resources.

1.2 Project Scope

You will construct a library of functions that define a user-level threads
package. Using your library, a program can create threads, destroy them,
and allow the threads to control the scheduling underneath. Thus, a typical
program using your library will typically look like:

where "root_i" is a "root function" that the ith thread calls to
start executing.
When a thread is created, it is "assigned" a "root function" from which it
starts executing. The thread within that function can perform useful work,
then yield (voluntarily or involuntarily) the CPU to another thread, and
repeat this sequence as necessary. The thread of course can call other
functions from within its "root" function. A thread can destroy itself
explicitly or implicitly. Explicit destruction is done by calling the thread
library. The thread is implicitly destroyed when its root function returns.
Additionally, to add more control to the program, a thread may destroy
other threads as well.

1.3 User-level v. kernel threads

For practical reasons, this project is done at user level: you will construct
a set of functions that your program will call to provide the illusion
of concurrency. Modern operating systems provide kernel threads where similar
functions are provided as system calls to the kernel rather than function
calls to a library. Both types of threads use the same core techniques
for providing the concurrency abstraction; you would build kernel threads
in essentially the same way you build user threads in this project. Also,
kernel processes are built using these techniques.

There are a few differences between kernel and user threads. Mainly:

Multiprocessing. Whereas user level threads provide the illusion
of concurrency, on machines with multiple processors, kernel level threads
can provide actual concurrency. This is because for user level threads,
the kernel schedules the user process on one CPU and the user-level threads
package multiplexes threads of control within the process. For kernel threads,
the kernel is aware of different threads of control, and it can simultaneously
schedule different threads from the same process on different processors.

Note: A key simplifying assumption for this project is that you will
allow programs to multiplex some number (e.g., m) of user level
threads on one kernel thread. This means that at most one user
level thread is running at a time and that your runtime system has
complete control over the interleaving of user level threads with each
other. More sophisticated systems implement m on n threads
packages where m user level threads are multiplexed across
n kernel threads.

Asynchronous I/O. When a user-level thread makes a system call that
blocks (e.g., reading a file from disk), the kernel scheduler moves the
process to the Waiting state and will not schedule it until the
I/O has completed. Thus, even if there are other user-level threads within
that process, they have to wait, too. Conversely, when a kernel thread
blocks for a system call, the kernel scheduler is aware that other threads
in the same process may be ready to run. Thus, some kernel threads may
be running while others are waiting for I/O.

Timer interrupts. In this project, to simulate the timer interrupts
that cause the scheduler to switch from one thread or process to
another, we will use POSIX signals. In your implementation, the threads
library will will "turn off interrupts" by blocking delivery of these signals
using system calls. However, there is nothing to prevent the threads, themselves,
from "turning off interrupts" the same way. Thus, even though we will implement
"preemptive" threads, a "malicious" thread could turn off interrupts and
not be preempted until it calls Yield(), thus hogging the CPU. Note that
kernel schedulers don't have this problem. Only the privileged code in
the kernel can turn off the real timer interrupts.

1.4 Provided code

2. Thread context

Each thread has per-thread state that represents the
working state of the thread---the thread's program counter, local
variables, stack, etc. A thread context is a subset of this
state that must be saved/restored from the processer when switching
threads. (To avoid copying the entire stack, the thread context
includes a pointer to the stack, not the entire stack.) Your library
will store the thread context in a thread control block, a structure that
holds the state your library keeps track of for each thread.

2.1 Saving/Restoring Thread Context

When a thread yields the CPU, the threads library must save
the thread's current context. The current context contains all the
registers that belong to the thread. The library restores the saved
context later when the thread gets its turn to run on the
processor. Additionally, the library creates a fresh context and a
new stack when it creates a thread.

Fortunately, the C runtime
system allows an application to retrieve its current context and
store it at a memory location and to set its current context to a
predetermined value from a memory location.
Your library will make use of two existing library
calls: getcontext and setcontext.

Study the manual pages
for these two calls. Notice that getcontext() saves the current
context into a structure of type struct ucontext of type ucontext_t. So, if you
allocate a struct ucontext in memory and pass a pointer to that
memory to a call to getcontext(), the current registers and other
context will be stored to that memory. Later, you can call setcontext()
to copy that state from that memory the processor, restoring the saved
state.

Look in sys/ucontext.h to find the details
for the fields of a struct ucontext. On the cs department's
public linux machines, this file is located in
/usr/include/sys/ucontext.h.

Finish implementingparseUcontext.c. Make sure you understand how a
thread's context is saved.

You will use getcontext and setcontext in two
ways. First, to suspend a currently running thread (to run another
one), you will use getcontext to save its state and later use
setcontext to restore its state. Second, to create a new
thread, you will use getcontext to create a valid context but
you will leave the current thread running; you (the current thread, actually)
will then change a few registers in this valid context to initialize
it as new thread, and put this new thread onto the ready list;
finally, at some point the new thread will be run by calling
setcontext on this new thread's context.

2.2 Changing Thread Context

As noted above, just creating identical thread contexts isn't quite
enough. To create a new thread, you can't just make a copy of the
current thread's context. You need to make a copy and then change 3 things:

You need to change the program counter to point to the function
that the thread should run.

You need to allocate and inititalize a new stack.

You need to change the stack pointer to point to the top of the new
stack.

In the real world, you would take advantage
of an existing library function, makecontext(), to make the first and third
changes. In the real world, the advantage of using the function is that it abstracts
away the details of how a context is saved in memory, which simplifies
things and helps portability. In this class, the disadvantage
is that it abstracts away the details of how a context is saved in
memory, which might leave you vague on exactly what's going on.

In the spirit of "there is no magic", for
this project you may not use makecontext(). Instead, you must
manipulate the fields in the saved ucontext_t directly.

You will change the program counter to point to a stub function
that the thread should run.

You will use malloc() to allocate a new stack, and you will
initialize the new stack to include arguments to the stub function.

You will change the stack pointer to point to the top of the new
stack. (Warning: in x86, stacks grow down!)

What is the stub function? How does the stack work? Read on.

2.3 Stub function:

When you create a new thread, you want it to run some
"root" function that defines the work you want it to do.
A thread is destroyed
implicitly when it returns from its "root" function, much like the
main program thread is destroyed when it returns from the "main"
program. So, rather than have your thread begin by running the root
function directly, a simple implementation of this feature is to start
the thread initially in a "stub" function that calls the actual root
function of the thread (much like main is actually being called from
the crt0 stub in UNIX). Then, your root thread has somewhere to return
to, should it return. This arrangement would look like:

stub(void (*root)(void *), void *arg){

// thread starts
hereTid ret;root(arg); // call root functionret = ULT_DestroyThread(ULT_SELF)assert(ret == ULT_NONE); // we should only get here
if we are the last thread. exit(0); // all threads are done, so process should
exit

}

In the above code, the argument root
is a pointer to the root
function that describes the real work the thread should do; notice
that in C, a function's name refers to the address of its code in memory. arg
is the argument to pass to that function; we'll have the root function
take a pointer to an arbitrary type as an argument so that you can
pass the root function pointer to whatever you want. ULT_DestroyThread,
ULT_SELF, and ULT_NONE are defined below.

2.4 On contexts and calling conventions

The context variable
contains many data fields, but you need only deal with two of them:
the stack pointer and the program counter. Other than that, you
don't need to worry about the fields within the context variable, as
long as you do not tamper with them. Also, it is a good idea to use
variables that have been initialized through a getcontext
call in order to not have bizarre behavior.

Under the C calling conventions in x86, here's what
things look like while any given function is executing:

Notice that as a procedure executes, it can allocate
local space by moving the stack pointer, and it can
find local variables, parameters, return addresses, and
the return %ebp by indexing relative to the (fixed) %ebp
register.

To make a procedure call, the compiler pushes parameters onto the
stack from right to left, saves the current instruction pointer
(eip) onto the stack, and changes the instruction pointer to the
called procedure. The called procedure then saves the old
frame pointer (ebp) onto the stack and sets the new frame
pointer to point to the old frame pointer at the top of the stack.
Now the called function is free to shove other stuff
onto the stack (e.g., local variables, register spills, saved
registers, the frames of called procedures) because
it can still locate stuff (especially return information) relative
to the fixed frame pointer (ebp).

To return to the caller,
a procedure simply copies the frame pointer (ebp) to the stack pointer
(esp),
pops the top stack item into ebp to restore the old ebp,
and uses the ret instruction to pop the old instruction
pointer off the stack into the processor's instruction register
(eip), returning control to the calling function.

3. Cooperative Thread
Package Application Program Interface (API)

In this project you will build a user level
threads package. A key simplifying assumption (relaxed in the extra
credit portion below) is that threads are cooperative---each
thread runs until it explicitly releases the CPU to another thread by
yielding the thread or exiting. In contrast preemptive threading
systems allow a scheduler to interrupt a running thread at any time
and switch the CPU to running a different thread.

The thread package provides several functions calls
to allow application programs a degree of control over thread management.
In addition, there are a few conventions that application programs must
adhere to in order to ensure proper and safe operation. A list of the functions
that constitute the User-level threads (ULT) API can be found in ULT.h
and are summarized here:

Tid ULT_Yield(Tid tid): This
function suspends the caller and activates the thread given by the identifier
tid. The
caller is put on the ready queue and can be invoked later in a similar
fashion. The value of tid may
take the identifier of any available thread. It also can take any of the
following constants:

The constant ULT_ANY
tells the thread system to invoke any
thread on the ready queue. A sane policy is to run the thread at the head
of the ready queue.

The constant ULT_SELF
tells the thread system to continue the
execution of the caller. This turns the function call into an no-op, but
it may be useful for debugging purposes.

The function returns the identifier of the thread
that took control as a result of the function call. Note that the caller
does not get to see the result until it gets its turn to run (later). The
function also may fail and the caller resumes immediately. To indicate
the reason for failure, the call returns one of these constants:

The constant ULT_INVALID
alerts the caller that the identifiertid does not correspond to a valid
thread.

The constant ULT_NONE
alerts the caller that there are no more
threads (other than the caller) available to run (in response to a
call
with tid set to ULT_SELF) or available to destroy (in response
to a call wtih tid set to ULT_ANY.)

Tid ULT_CreateThread(void (*fn)(void *),
void *arg): This function
creates a thread whose starting point is the function fn.
.
Arg
is a pointer that will be passed to the function when the thread starts
executing; arg thus allows arguments to be passed to the function. Upon
success, the function returns a thread identifier of type Tid.
If the function fails, it returns
a value that indicates the reason of failure as follows:

The constant ULT_NOMORE
alerts the caller that the thread package
cannot create more threads.

The constant ULT_NOMEMORY
alerts the caller that the thread package
could not allocate memory to create a stack of the desired size.

The created thread is put on a ready queue but does
not start execution yet. The caller of the function continues to execute
after the function returns.

Tid ULT_DestroyThread(Tid tid):
This function destroys the thread whose identifier is tid.
The caller continues to execute and
receives the result of the call. The value of tid
may take the identifier of any available
thread. It also can take any of the following constants:

The constant ULT_ANY
tells the thread system to destroy any
thread except the caller. While this sounds too draconian, this function
can help in dealing with drastic situations where a thread detects a fatal
error that cannot be handled and the only recourse is to stop the program.

The constant ULT_SELF
tells the thread system to destroy the
caller. In this case, the system destroys the caller and reclaims its resources.
The function obviously does not return in this case. Instead, some
other ready thread gets scheduled.

Be careful. It is dangerous to use memory once it has been freed. In particular,
you should not free the stack of the currently running thread while it is still
running. (Yet you still need to make sure that that stack eventually
gets deallocated. Think about this a bit!) You should convince yourself that
your program would work even if you used a debugging malloc library
that overwrites a block with dummy data when that block is free()'d.

Be careful. If you destroy a thread that is holding a
lock, deadlock may occur. For this reason, it is usually best to only use DestroyThread
to have each thread destroy itself (not other threads.) In fact, the Java library deprecates
the equivalent function in their thread library.
Note that Java's alternative makes use of the richer exception
model that Java has compared to C.
We include the more general form here
partly as a design exercise and because you may need it for some of your tests.

Upon success, the function returns the identifier
of the destroyed thread. The function also may fail and returns one of
these constants:

The constant ULT_INVALID
alerts the caller that the identifiertid does not correspond to a valid
thread.

The constant ULT_NONE
alerts the caller that there are no
more other
threads available to destroy (in response to a call with tid
set to ULT_ANY).

3.1 Structure of and requirements on your solution:

Your library must maintain a "thread control block"
for each thread that is running in the system. This is similar to the process
control block that an operating system implements to support process management.
In addition, your library must maintain a queue of the threads ready to
run, so that it can process the application program function calls. Note
that the application programs do not explicitly call a function to initialize
the library data structures. Therefore, your library must always ensure
that any necessary initialization is done during the first time a function
is called within the library.

As noted above, your library must use
getcontext() and setcontext() to save and restore thread
context state, but it may not usemakecontext or any
other existing code to manipulate a thread's context; you need to
write the code to do that yourself.

Each thread should have a stack of at least
ULT_MIN_STACK and your implementation should support the
creation exactly ULT_MAX_THREADS threads by a
program (including the initial main thread). Your implementation must not statically allocate all
stacks at initialization time. Instead, you must dynamically allocate a stack
whenever a new thread is created (and delete one each time a thread is destroyed.)

3.2 Hints, leading questions, and advice

This project does not require to
write a large number of lines of code. It does require you to think
carefully about the code you write. Before you dive into writing
code, it will pay to spend time planning and understanding the code
you are going to write. If you think the problem through from
beginning to end, this project will not be too hard. If you try to
hack your way out of trouble, you will spend many frustrating nights
in the lab.

As a start, here are some questions you should
answer before you write code.

Getcontext
"returns" twice. Once when you create a context and again when you
switch to that context. What action will you take in each case? How
will you tell which case you are in?

Most threads are created with ULT_CreateThread(), but
the initial thread is there before your library is
invoked. Nonetheless,
the original thread must be able to ULT_Yield() to let other
threads run, and other threads must be able to ULT_Yield()
and let the original thread run.

In fact, a strongly recommended first
milestone might be for ULT_Yield(ULT_SELF) to work for
the initial thread (where your implementation stores
the caller's state to a thread control block in the ready
queue and then resores that state from the thread control block.
Get this working before you try to implement ULT_CreateThread()
or ULT_DeleteThread().

Also note that when
the initial thread in a C process returns, it calls the exit()
system call, which causes the OS to destroy the process (even
if you have other user level threads in the process that want to run.)

A hard bug to find would
be an overflow or underflow of the stack you allocate. How might
such a bug manifest itself? What defensive programming strategies
can you use to detect stack overflow in a more controlled manner as
the system runs?

Use a debugger. As an exercise, put a breakpoint
at the instruction after you copy the current thread's state
using getcontext(). You can print the current
values of the registers (in gdb, type "info registers").

You can print the values stored in your structs
(e.g., in gdb I use "print/x myTcb->context" to print
the context stored by a call to getcontext()).

You may find this particularly useful in making
sure that the state you "restore" when you first
run a newly-created thread makes sense.

3.3 Tests

Your library is done, the tests in doTest.c should run.
Grade.sh will compare your results to the expected
results in doTest.expected

> make doTest
...
> ./doTest
...
Done.
>

These tests are provided as a guide to help you with
your debugging.
Note that successfully running these tests is not a guarantee
that your solution is correct or that it will get a good grade.
We will also look at source code, run other tests, etc.

You should write additional tests of your own. The worst
consequence of a subtle bug is not a bad grade, it is the many hours
you may end up suffering in the lab. A mark of a good programmer is a
good testing strategy -- spend some time adding to the tests we
provide.

4. Extra credit (20%) Preemptive multi-threading

In this optional phase of the project, you will extend
the cooperative multi-threaded system (where ULT_Yield() causes
control to be passed from one thread to the next) to make it a
pre-emptive system where simulated "timer interrupts" cause the
system to switch from one thread to another.

In the files interrupt.h and interrupt.c we provide code to install a
signal handler routine that will be called whenever the process
receives a SIGALRM signal. This code also uses the alarm() system call
to cause the system to periodically send SIGALRM calls to the
process. Make the program showHandler and run it
to see how this works.

Your task is to modify the interruptHandler()
routine (and your threads system) to move the currently executing
process to the ready queue and to move some process off the ready
queue to become the new currently executing process.

Shared
data structures. Note that interrupt signals can be sent to your
process at any time even when a thread is in the middle of a
ULT_Yield(), ULT_DestroyThread(), or ULT_CreateThread() call. It
is a very bad idea to allow multiple threads to access shared
variables (such as your ready queue) at the same time. You should
therefore ensure that only one thread can be in a critical
section (accessing global variables) of the thread library at a
time. A simple way to do this is to disable interrupts when you
enter procedures of the thread library and enable interrupts when
you leave. In the interrupt.c/.h files, we provide the routines
interruptsOn() and interruptsOff() to accomplish this.

Hint:
think carefully about the invariants you want to maintain about when
interrupts are on and when they are off. Note that because of thread
context switches, the thread that turns off interrupts may not be the
one that turns them on. Maintain the right invariants, and you'll
have no trouble dealing with this.

4.1 Tests

Once phase 2 is done, the tests in doTest2.c should run.
Grade.sh will compare your results to the expected.
Again, you should not count on these tests to be comprehensive,
and you should add your own tests to protect your grade and your sanity.

4.2 Additional notes and disclaimers

If you attempt the extra credit, be sure to include a text file README-XC that
documents your approach to help the TA understand and evaluate it.

If you attempt the extra credit, be sure to save a working version of
the lab first. You may want to make use of a version control system like SVN
or CVS to track multiple versions of your code.

The number of points available here is small relative to the work
you will have to do. Don't do this extra credit portion of the
assignment for the points -- do it for your own interest and
benefit. Note that the instructional staff will not be able to provide
much assistance on the extra credit (we'll be focusing our attention
on people still working on the main part of the assignment.) Also note
that we won't spend much time grading the extra credit work you turn
in. There are two implications: (1) It either is right or it isn't. If
it isn't, we're not going to spend time trying to identify places
where we can give you partial credit. (2) You must provide concise,
clear documentation of your design in README-XC and you must provide
and document a systematic, comprehensive, convincing test plan for
your code. It is not the TA's job to test your code and find problems, it is
your job to test your code and convincintly demonstrate that it works.
If the TA cannot easily and quickly understand what you did, you will
not receive extra credit points.

5. On Programming and Logistics

The following guidelines should help smooth the process of delivering
your project. You can help us a great deal by observing the following:

You will work on this project in groups of two. If you cannot find
a partner, contact Mike immediately!

We will grade the project on the department's public linux machines
(run "cshosts publinux" for a list.). Ensure that you project works there! If you
do development on another platform, you do so at your own risk. The statement
"it worked on my home machine" will not be considered in the grading process.
(Note: it should be possible to do most development on other flavors of
Unix, OSX, and perhaps even Windows/Cygwin, and you are welcome to do so. But, managing any
difficulties porting to Linux is your job, and you should allow time
for doing so.)

After you finish your work, please use the turnin
utility to submit your work.

Usage:

% make clean

% turnin -submit muqeet handin-372-labULT your_files

Do not include object files in your submission!!

Select reasonable names for your files and variables. This way, you make
grading easier.

Your files should never refer to absolute names. For example, to include
foo.h, do not write:

#include
"/u/dahlin/cs372/labs/labULT/foo.h" /* poor style */

You must provide a documentation file README.txt that explains briefly your solution
and any assumptions that you made.
The code, itself, should contain the bulk of your
documentation in the form of well-considered comments.

Thorough testing is the mark of a good programmer. In addition to turning
in the code for the system specified above, you should also turn in a set
of test programs that exercise the functionality of your system. Your documentation
should list these programs, explain how to run them, explain the importance
of each test, and summarize the expected results of each.
We will deduct fewer points for errors that your test system flags
than for errors that it does not catch but that we do.

You are encouraged to reuse your own
code that you might have developed in previous courses to handle things
such as queues, sorting, etc. Alternatively, you may use other
publicly available code for basic data structures, etc.

You may not use code that
subsumes the heart of this project (e.g., you should not base your
solution on wrappers of or code taken from the posix thread
library). If in doubt, ask.

You are encouraged to discuss the problem with your colleagues. However,
you must follow the collaboration restrictions described in the syllabus.
For example,
you are not allowed to look at anyone else's code, and no one else can
not look at yours.
For example, when you are talking to them, you should not be looking
at your code, and vice versa.

If you find that the problem is under specified, please make reasonable
assumptions and document them in the documentation file.

This project's main difficulty is in conceptualizing the solution. Once
you overcome that hurdle, you will be surprised at how relatively simple
the implementation is!