Lab 1 Cooperative Green Threads

Due: Wednesday April 19, 2017 11:59PM

Overview

In this lab, you will write a cooperative green threads library, chloros (a
Greek word meaning “green”) . You will be programming in C and x86-64 assembly.

Greens threads are similar to the type of threads you might be familiar with,
OS threads, but are implemented entirely in user-level code. They are also known
as user-level threads. They are typically speedier than OS threads since there
is no context switching into/out of the kernel. Cooperative green threads are
green threads that release control to the scheduler deliberately, usually
through a yield() call. That is, they execute for as long as they want.

A user of your chloros library first calls grn_init() to initialize the first
green thread, one representing the originally executing program. Then, the user
calls grn_spawn(user_fn) to create a new thread that executes the user_fn.
(Note that in a production thread library, grn_spawn would take an extra void
* argument to pass to user_fn so as to spawn different threads running the
same function on different arguments.) The user calls grn_yield() inside a
thread to yield execution to another thread, including the original. Finally,
the user might call grn_wait() to wait for all of the spawned threads to
finish executing before ending the process.

This lab is divided into 5 phases and a 6th (optional) extra credit phase. The
first 4 phases will guide you towards implementing cooperative green threads.
Once you have completed phase 4, the example program above will run as you
expect. In phase 5, you will implement simple thread garbage collection so that
resources aren’t leaked after a thread terminates. In the optional extra credit
phase, you will add preemption support to your library.

There are unit tests corresponding to each phase, allowing you to mark your
progress as you work through the lab. You may find the list of references to the
right (or below if on mobile) useful as your proceed.

Phase 0: Getting Started

First, ensure that you are working on the lab using a compatible machine. For
this lab, compatible means that the machine meets the following requirements:

The include/ directory contains the header files for these src/ files where
applicable and are named like their src/ counterpart. There are two
exceptions: 1) include/chloros.h, which exports the high-level library
routines for use by user binaries that link to your library, and
2) include/utils.h, which defines the following helpful macros (among others):

debug(...) like printf, but only prints if there is a #define DEBUG at
the top of the file

assert(cond) checks that cond evaluates to true. If it does not, prints
an error message and terminates the program

err_exit(...) like printf but terminates the program after printing the
error message

Finally, the test/ directory contains the unit testing code. The tests
themselves are in test/src/phase{n}_tests.c. If a test you expect to pass
fails, you can look at its test file to see why.

Make targets

You will use make to build, test, and create a submission file for this lab.
The following targets are defined in the Makefile:

all: compiles and packages your library

test: runs the unit tests against your library

submission: creates the lab1.tar.gz submission file

clean: deletes files that can be remade - useful when changing headers

Calling make in your shell will implicitly run the all target. The other
targets can be invoked using make {target}. Ensure that you are in the root of
your lab directory before invoking make. Try calling make test now. You
should see five failing test suites. If this doesn’t work, your machine may not
meet the requirements.

Getting familiar

Before you continue, read through the src/ files and some of the unit test
files (test/src/phase{n}_tests.c) to see how the functions you will write are
intended to be used and what we will test for. Also pay attention to the code
style: how variables are named, how code is aligned, and how spacing is used.
You should write your code with an equivalent style. Once you have a good grasp
of the chloros thread interface and functions, you’re ready to start writing
code.

Phase 1: Allocating and Deallocating Threads

In phase 1, you will implement the functions marked FIXME in src/thread.c.

The grn_thread structure, defined in include/chloros.h, contains all of the
information needed to context switch into a thread. Specifically, it contains a
thread’s state (context) and metadata the scheduler uses to make decisions. We
keep track of the threads in the system via a linked list: the next and prev
pointers in the grn_thread structure point to the next and previous elements
of the linked list, respectively, and the threads field of the global STATE
(declared in include/main.h and defined in src/main.c) holds a pointer to
the head of the list. Examine and ensure that you understand the thread linked
list manipulation functions we have implemented for your use in src/thread.c:
add_thread, remove_thread, and next_thread. We’ve also provided the
atomic_next_id function which simply returns a strictly increasing counter.

Your task is to implement the functions that allocate and destroy (deallocate)
grn_thread structures for use by the rest of the system: grn_new_thread and
grn_destroy_thread. You can find the functions’ specifications directly above
their definition. Once you have done this, you should pass the phase 1 unit
tests.

Phase 2: Context Switching

In phase 2, you will write the code to context switch from one thread to another
in the assembly function marked FIXME in src/context_switch.S.

The context_switch function will be called on behalf of a green thread during
its execution to context switch from one thread to another. For a context switch
to be successful, all state of the currently executing thread must be saved in
some structure and the state of the thread being switched into must be restored.
A thread’s state is also referred to as its context.

A thread’s context is already defined in the grn_context structure in
include/chloros.h as seven 64-bit registers: rsp, the stack pointer, rbp,
the base pointer, and rbx, r12, r13, r14, and r15, general purpose
registers. All of these registers are known as callee saved, which means that
the callee of a function guarantees that those registers will have the same
value when the function returns as they did when the function was called. In
other words, a function A calling another function B can expect those
registers to contain the same values after function B returns as they had
when function B was called because the callee (B) saves and restore
those values. These registers are defined this way by the System V X64
ABI
calling convention, which is followed on most Unix systems running on 64-bit
machines. You can read more at the ABI reference link.

Where are the rest of the registers?

A keen reader might wonder why a thread's context only contains the
callee-saved registers when there are more general purpose 64-bit registers.
The answer relies on the notion of `caller-saved` registers, which are
registers that a function caller saves before calling a function. The
registers that don't appear in `grn_context` are caller-saved registers.
Since the `context_switch` function will be _called_, the compiler will emit
instructions to save and restore those registers, so no extra work is needed
to maintain that state, and the `callee-saved` registers suffice as our
register state.

Apart from these registers, a thread’s full context also includes the values in
its stack. Saving and restoring a thread’s stack on each context switch would be
a very expensive operation, so instead of doing this, we simply give each thread
its own unique stack. Then, as long as each thread’s stack pointer points to its
own unique stack, saving and restoring the stack pointer suffices to save and
restore the thread’s stack. Later on, you’ll write the code that ensures that
each thread’s stack pointer points to its own unique stack (which you’ve already
allocated in grn_new_thread). The context_switch function should only be
called on already running threads, so the validity and uniqueness of the stack
pointer can be assumed.

You’re ready to implement the context switching assembly function in
src/context_switch.S. You can find the function’s specification above its
definition. Keep in mind that according to the calling convention, the first two
parameters to a function are passed in the rdi and rsi registers. Also note
that GCC calls the GNU Assembler implicitly, which uses the GAS syntax for
assembly. You may wish to consult the calling convention in the ABI reference or
the X86 instruction reference, both of which are linked in the column to the
right. After you have implemented this routine, you should pass the phase 2 unit
tests.

Hint:
You can implement `grn_context_switch` using only two different assembly
instructions: `mov` and `ret`.

Phase 3: Yield

In this phase, you will implement grn_yield in src/main.c, the main thread
scheduling function.

When a thread wants to release control to the scheduler, it yields its
execution time by calling grn_yield(). The function’s task is to find and
execute an available thread. It is a scheduler: it makes decisions about which
thread to schedule when and does the necessary bookkeeping to continue
scheduling threads reliably in the future.

Each thread in chloros is marked with a status of READY, RUNNING, WAITING,
or ZOMBIE. A thread is READY when it can be executed, RUNNING when it is
executing, WAITING when some processing needs to happen before it is READY,
and ZOMBIE when a thread has finished executing but hasn’t has its resources
freed and destroyed.

Your grn_yield() function should use and modify these statuses for scheduling.
It should implement a round-robin scheduler, performing a linear search for the
next READY thread beginning at the currently executing thread,
STATE.current. Once it finds a thread, the status of the currently executing
thread is set to READY if it was previous RUNNING, and the newly found
thread’s status is set to RUNNING. The pointer to the currently executing
thread, STATE.current, is updated. Finally, the function context_switches
into the newly found thread.

Your task is to implement the grn_yield function. The full specification is
above its definition. After you have implemented this routine, you should pass
the phase 3 unit tests.

Hint:
Use the `next_thread` function for a looping linear search.

Warning:
Yield's unit tests are not exhaustive. A later phase's unit tests might reveal
a bug in this phase.

Phase 4: Spawn

In this phase, you will implement grn_spawn in src/main.c, the new thread
initializer. After implementing this function, you will have implemented
cooperative green threads.

A thread created by a user executes a function of the user’s choice. The
grn_spawn function accomplishes exactly this: it allocates a new thread,
initializes its context so that the user’s function is executed after a context
switch, and finally calls grn_yield to yield the caller’s execution to
(potentially) the newly initialized thread.

When a thread is allocated by your grn_new_thread function, it has an empty
context, so context switching into it would likely crash the process. Your task
is to write grn_spawn to set up values in the stack so that after the first
context_switch into the thread, the thread begins executing the start_thread
routine, a thread start function written in assembly we have provided to you.
start_thread expects the user’s fn to be at the top of the stack when it is
called. Thus, you must design the initial stack so that during a given thread’s
first context_switch, context_switch returns to start_thread, and
start_thread finds fn at the top of the stack.

The implementation of this function requires you to write values into the
thread’s initial stack and then set the stack pointer appropriately. Note that
the x64 ABI declares mandates the stock pointer to be 16-byte aligned. The
full function specification is above its definition. After you have implemented
this routine, you should pass the phase 4 unit tests.

Hint:
The only register needing initialization is %rsp.

Phase 5: Garbage Collection

In this phase, you will implement grn_gc in src/main.c, a simple thread
garbage collection routine.

Upon thread termination, the thread’s initial function returns to
start_thread, and start_thread calls grn_exit to terminate the thread.
grn_exit is a simple function: it simply sets the thread’s status to ZOMBIE
so that grn_yield doesn’t schedule it again.

Your task is to ensure that a thread’s resources are freed after it has exited.
You must implement grn_gc, which finds all ZOMBIEd threads and frees their
resources. Then, insert a call to grn_gc in an appropriate location in your
library. Note that you have already written grn_destroy_thread: grn_gc
should call this function appropriately. The full function specification is
above its definition. After you have implemented this routine, you should pass
the phase 5 unit tests.

Hint:
When is it and when is it not safe to call `grn_gc`?

Submission

Once you’ve completed the tasks above, you’re done and ready to submit! Ensure
that your lab and tests run as you expect them to on corn.stanford.edu. We
will grade your lab on the FarmShare machines. Any changes made to files outside
of the src/ or include/ directories will not be submitted with your lab.
Ensure that your submission is not dependent on those changes.

Finally, run make submission and then proceed to the submission
page to upload your
submission.

Phase 6: Preemption (Extra Credit)

This phase is entirely optional and lack of completion will have no adverse
effects on your score. Successful completion of this phase will result in a
25% boost to your score. You must successfully complete all phases to be
awarded the extra credit.

In this extra credit phase, you will add preemption to your green threads
library. A preemptive thread is a thread that is interrupted by the scheduler
when the scheduler sees fit; the thread has no control over when it yields
execution to other threads. You will likely need to modify grn_init to handle
the true case for the preempt parameter as well as add new functions and
modify previous functions. To test this phase, uncomment the line underneath the
TODO in test/src/test.c.

A preemptive scheduler works by setting a timer for some short duration and
receiving an interrupt when that timer expires, transferring control to the
scheduler without the thread’s intervention. Once the scheduler has control, it
can decide to switch execution to another thread.

In an OS, the kernel sets up a timer interrupt using either an external
Programmable Interrupt Timer (such as the intel
8254) or a modern
CPU’s internal APIC. In user-level code, we
don’t have direct access to hardware, so we must use the mechanisms exposed to
us by the underlying OS. Thankfully, POSIX has provided us with timers which
deliver a signal to the timer-setting process when the timer has expired. You
can think of signals as user-level interrupts. You should consult the
signal(7) man page reference for further important information on signals.

The setitimer function, also referenced, allows you to set up a timer to
signal the calling process after it expires. To catch the signal, you must
register a signal handler using the sigaction function. The following is
pseudocode for setting up a timer and signal handler:

Your task is to setup a timer immediately after a thread has been scheduled,
catch the resulting signal, and yield the interrupted thread. If a new thread
is scheduled as a result of the yield, a new timer should be setup (which may
be done automatically depending on how you initially called setitimer).

Be aware that context switching out of a signal handler means that technically
the signal handler has not returned. Hence, you have to be wary of some of the
particularities of Unix signals. Specifically:

By default, a signal is blocked or masked while it is being handled and only
unmasked when the signal handler returns. If you yield while the timer signal
is masked, the next thread will not be preempted by a signal. You may wish to
address this through use of the SA_NODEFER flag to
sigaction, or
through use of
sigprocmask(2).

Many functions in the C library are not reentrant, a good example being
malloc. You cannot call these functions from within a signal handler. The
signal-safety(7)
man page enumerates the set of all async-signal-safe functions that are safe
to call from within a signal handler. Note that since threads are preempted
within signal handlers, you may need to mask interrupts or hold a lock for the
duration of these functions. If needed, you can wrap library functions using C
preprocessor macros (e.g., #define malloc chloros_malloc), but are only
required to do so for functions actually used in the lab.

Across a call to grn_yield(), the compiler naturally saves any live
caller-saved floating point registers to the stack, just like caller-saved
integer registers. With signals, however, the C language spec does not
require implementations
to save and restore floating point registers across handlers. As it happens,
Linux saves and restores floating point registers anyway on
x86. As a rule, this
will not be the case for all OSes and architectures, however. For the
purposes of this lab, you can assume threads do not use floating point as
long as you don’t use it yourself.

After successful completion of this phase, you should pass the phase 6 unit
tests. To receive the extra credit, please add a file named extra_credit.txt
inside the src/ directory explaining your implementation. Then, resubmit.