Object is prefixed with a size_t header regardless it's free or not. The first word of free object is also used to store the "next object" pointer. Allocated objects do not have a next pointer. The free objects form a singly linked list.

Does not sort objects to address order.

Splits large objects into smaller ones.

No coalescing.

LIFO (as opposed to first fit).

Demonstrated that the allocator works with a simple program "tac" which reads each line from stdin into memory and prints them out last line first.

Compile tac.c against standard C malloc: gcc -o tac tac.c

Compile tac.c against mm.c implementation: gcc -DUSE_MM -o tac tac.c

Possible improvements:

Calling mm_malloc() with request n >= ZONE_SIZE causes infinite loop. However, using a ZONE_SIZE that is too big will penalize your utilization score. This should be an easy fix.

Currently all objects are sizeof(size_t) aligned. Will need to modify pointer arithmetic to make it align to 8 bytes on any architecture. The malloclab requires all objects to be 8-byte aligned.

Deferred coalescing: wait until the free list runs out; before asking for more memory from sbrk(), perform a merge sort, then coalesce all coalesceable free blocks in one pass.

The malloc() routine spends most of its time searching through the linear free list, which is O(n). Use a log(n) time lookup data structure such as binary search tree.

Approximate best fit strategy using segregated size class bins. See the design of Doug Lea malloc for some ideas. The "hash table" like data structure could provide close to O(1) malloc(), and best fit is known to minimize fragmentation.

To improve small object allocation, consider a special "headerless" class that is simply an array of same-sized small objects.

Note: if you want to use this mm.c with malloclab, remember to reinitialize mm_first_free to NULL in mm_init(). Otherwise mdriver will clear the heap and cause objects already in the free list to be freed again, creating a cyclic list that causes infinite loop in mm_malloc().

Stack-allocated objects become invalid as soon as the function returns, so callee cannot return objects on stack back to the caller.

Caller can create an object on stack and pass its pointer to the callee to store results there, but the caller must be able to predict how much space will be used by the callee (e.g. gettimeofday(), strftime(), fgets(), snprintf(), ...) Typically, the callee return value has a special way to indicate whether the space provided by the caller was sufficient.

Not a good paradigm for data structure, even those as simple as a list. The functions to enumerate directory entries for example---opendir(), readdir(), closedir()---are awkward to use and error prone if used incorrectly.

What is the heap?

Traditional Unix process views the heap as an extension to the data section, in order to store additional "global" objects created in run time. The data section typically stores objects created as a global variable.

Heap is grown in opposite direction to the stack (see figure on the right).

The brk() system call sets the end of the data section; sbrk() is a wrapper that remembers and returns the previous end of the data section, and adjusts the new end by the given number of bytes. In other words, use of sbrk() feels like malloc().

There is no corresponding free() for the data section. The end of the data section can be shrunk, but one must take the trouble to ensure that the shrinkage does not cause objects that are still in use to be invalidated.

brk() typically operates on one page at a time: if we want 8 bytes, it will still give us a page (4096 bytes). We typically use malloc(), which obtains memory from sbrk(), to manage the heap in a fine grained way.

Modern operating systems use paged virtual memory management to allocate pages anywhere in the address space.

Simple strategies to manage the heap:

Use it like a stack. The decoupling of call stack and object stack means objects can survive the scope of a function call. Objects must be freed in the exact opposite order they're allocated. See K&R section 5.4 pp. 101.

Use a singly linked list to chain free memory blocks. Objects may be freed in any order. May cause fragmentation problem. See K&R section 8.7 pp. 185.

The use of union in this case ensures that header is at least as big as the Align type, which is typedef as a long int. The malloc() code on pp. 187 uses sizeof(Header) to round up the number of bytes to allocate.

On a 16-bit machine, long int is 4 bytes, pointer 2 bytes, and unsigned int 2 bytes. The union does nothing in this case because sizeof(x) == sizeof(s) (x and s are the members in the union).

On a 32-bit machine, long int is 4 bytes, pointer 4 bytes, and unsigned int 4 bytes. All of them are aligned to their size. The union does nothing in this case because sizeof(x) < sizeof(s).

On a 64-bit machine, long int is 8 bytes, pointer 8 bytes, and unsigned int 4 bytes. The struct is 12 bytes, and the union still does nothing because sizeof(x) < sizeof(s).

In all cases, sizeof(Header) == sizeof(s), so the union does nothing. Furthermore, on 64-bit machine, the compiler may leave sizeof(Header) as 12 bytes, which is not a very good number to round up to.

Continue from last week the implementation of binary search tree traversal, this time using pointer to pointer (see bst-new.c attached). This approach cannot be efficiently emulated by other languages like Java.

Review: writing free_bst() recursively. The malloc()/free() pair of functions is how you obtain/relinquish memory. Remember not to free(t) first, otherwise t->left and t->right would become garbage.

find_pos() takes pointer to pointer to a binary search tree node for argument, and returns pointer to pointer to node where the node with given key would be found. If there is no node with the given key, returns pointer to the NULL pointer. It could either be the original bst_t ** to the root node, or either the memory location of the left or right member of some node that contains the NULL pointer.

In other words, find_pos() returns the position that can be used to modify the tree, either for insert or remove.

find() can be written in terms of find_pos().

insert() can be written in terms of find_pos().

As an exercise, see if you can write remove() in terms of find_pos() as well. It may involve writing a find_min_pos() function.

Discussion

The indirect pointer can be seen reflected in the assembly code of bst-new.c. It doesn't increase number of instructions significantly (compared to bst.s, attached, generated from bst.c from last week).

The insert() function is now tail recursive, and it generates much better code compared to before.

With -O3, the compiler inlines the definition of find_pos() into find() and insert(), essentially synthesizing new functions based on the one we wrote.

Binary search tree in C. See bst.c attached below. Several points to note:

We're writing recursive functions. Base case (terminal case) is when the tree reaches a NULL pointer. The recursion makes progress towards the base case because every time we make a recursive call, we're one node closer to a NULL pointer. We can assume this is the case because tree pointers are acyclic. If we have cyclic pointers (it wouldn't be a tree anymore), our recursive function may not terminate.

find() takes a node and recursively calls find on either the left or right child depending on the key. If key is smaller than what's stored at node, we look into the left child. If key is greater, we look into the right child. Otherwise the key is equal to the one stored at the node, and we've found the node we're looking for. Initially, find() is called on the root of the tree. Notice that find() is tail recursive.

insert() is a bit different. Ideally it should just return void and modify the tree transparently, but the pointer we want to modify resides in the parent of the current node. For now, we make insert() return the new updated subtree to the caller, which knows the parent and will perform the update. Notice that insert() is not tail recursive. We'll see a different way to write insert() next time.

Assembly bomb lab hints

Rather than working online with gdb, I suggest you use offline tools like objdump to study the bomb offline, unless you "like to live dangerously" (there are way too many Austin Powers jokes in this lab).

To produce disassembly of the bomb, including hidden functions phase_1, phase_2, ..., that are not included in bomb.c,

objdump -d ./bomb

You could use shell redirection to save the output to a file:

objdump -d ./bomb > bomb-asm.txt

Note the output is not quite a valid .s file that can be assembled again.

You can dump the section content like this,

objdump -s ./bomb

and you will see output in six columns. The first column is the address, next four the content of this strip of 16-bytes of memory, and the last column the ascii equivalent, which allows you to see the strings visually. Note: each group of 4 bytes from columns 2 through 5 should be read as an array of bytes as they appear in memory, not as an integer with little or big endian byte order.

is to be loaded at memory address 0x08049790, and is the string "Good work! On to the next..." And this 16-byte strip:

80498e0 598e0408 628e0408 698e0408 728e0408 Y...b...i...r...

is at address 0x080498e0, with four little endian integers 0x08048e59, 0x08048e62, 0x08048e69, and 0x08048e72. This appears to be a jump table.

If you're an enterprising hacker, you might have thought about patching the explode_bomb() function in gdb to just return without doing anything. The "ret" instruction has the opcode 0xc3, so you can do something like this:

The line "set {char}explode_bomb = 0xc3" patches the function to just return without exploding. Using this technique, you can fool the program into thinking that the bomb has been defused! But the catch is, the bomb sends us e-mail about the input string you entered, and the solution is verified independently by the auto-grader. Patching the code, while clever, will unfortunately earn you an "invalid" status.

Function call convention

C pushes arguments on stack in reverse order, so 8(%ebp) is the first argument, 12(%ebp) is the second, etc. This allows the callee to potentially have an indefinite number of arguments, like printf(). See figure on the right.

The stdarg.h header provides a way to iterate over an indefinite number of arguments. See print_ints.c attached. The function va_start() initializes the argument pointer ap variable (its type is va_list) to the ... part, right after the given argument; va_arg() is like *ap++, and va_end() to do clean-up. See man stdarg for more information.

On Linux 32-bit x86, when compiled with -O3, the compiler uses %esi for ap, and "leal 12(%ebp), %esi" to initialize ap. The increment by va_arg() is done by addl $4, %esi. See attached print_ints.s.

Notice how in print_ints(), the compiler passes argument to printf() by reusing existing reserved stack space.

Ackermann's function and recursion

Recursive functions we've seen so far are primitive recursion, and can be written using while loops, e.g. factorial, fibonacci. Recursion, however, is much more powerful than that.

Introducing Ackermann's function, ack(m, n), which can grow faster than exponential rate in terms of n, for m >= 4. See ack.c attached below. For this function, we cannot cheat by implementing a while-loop version!

I'm using gcc 4.4.1 (csa2 has gcc 4.1.2 by default) which demonstrate tail call optimization. Comparing the assembly generated by gcc 4.4.1 -O0 (see ack-O0.s below) and -O3 (see ack-O3.s below), notice that for -O3, there is only one call ack in the body of ack(). That's due to tail call optimization. Basically, the statement "return ack(x, y);" is transformed to setting the arguments to x and y, followed by a goto. This saves the effort to create and destroy a stack frame that is not needed.

Overview of program's perception of memory

Each program is loaded into an isolated, private address space. It cannot see the memory of another program. The hardware and the OS works together to enforce the address space protection.

The address space on a 32-bit machine ranges from 0 through 0xffffffff (232 - 1); on a 64-bit machine, it ranges from 0 through 0xffffffffffffffff (264 - 1). A number that represents memory location in the address is called a pointer.

At address 0, there is a reserved NULL page intentionally made inaccessible in order to catch NULL pointer reference. This is because functions can only return one value, and NULL pointer (which is numeric 0) conventionally indicates an error condition. If a programmer forgets to check for error, the program can dereference NULL pointer and crash. This is the best case scenario so you can debug. It is better than if the program keeps running, reading random garbage from memory. NULL pointer dereference causes segmentation fault (see null.c below).

The asterisk serves two purposes. In null.c, char *p declares variable p to be a character pointer. In an expression, *p causes the character value at pointer p to be read out, or dereferenced.

Heap grows upward (not going to be observed for now), but stack grows downward (see stack.c). Local variables are placed on stack, and function call forces the stack to grow, so the char c in main() occupies a higher address than the char c in foo(). &c (unary & operator) gives you the address of the variable c.

Array, pointer arithmetic, and string

Summation of an integer array (see sum.c below).

Use size_t instead of int for the size of the array. size_t is basically an unsigned integer that has the same width as a pointer. On a 64-bit machine, using int to represent size of a memory object could overflow; size_t makes your program portable. size_t can be 0 or positive, but there cannot be negative sizes.

In C, pointer and array are synonymous. int *p is the same as int p[]. We can dereference pointer p either using *p or p[0]. Furthermore, p[i] is the same as *(p + i), and &p[i] is the same is (p + i).

*p++ is parsed as *(p++), to increment the pointer. On the other hand, (*p)++ means increment the memory location at p.

Prefix ++x and --x operators causes the variable x to be incremented or decremented before the statement is executed. Postfix x++ and x-- causes the same effect after the statement.

NUL '\0' terminator of a string (see nulterm.c below).

NUL character is a byte of value 0, denoted as '\0', not the character for number zero, denoted as '0'.

String is a character array that is NUL terminated. This is so we can pass a string without also needing to pass the size of the string.

If we pass a char buffer without NUL terminator, puts() will print garbage. The int x increases the likelihood that buf[] will not be immediately followed by a zero byte.

Computing the length of a string (see string.c below).

const char *s means we do not intend to modify the char pointed to by s, so the content of s is supposed to be constant.

dlc in the datalab-handout.tar is distributed as a Linux x86 binary executable. There is no source. If you work on Mac OS X or on Windows (requires Cygwin), download the dlc.macosx or dlc.zip (contains dlc.exe) from the attachments below.

If you do your datalab on a 64-bit machine, you might have problem with tmax() test that says the result should be -1 (0xffffffff) which is incorrect. You have two options.

Use this make command: make CFLAGS=-m32
This tells gcc to compile in 32-bit mode instead of the default 64-bit mode. If you get compilation error involving gnu/stub-32.h, you will have to install additional development packages for the 32-bit environment, e.g. on Ubuntu it's called libc6-dev-i386.

Change LONG_MAX to INT_MAX in tests.c. The grader will use his own copy of tests.c, but he will compile the code on csa2, which is 32-bit Linux.

The reason is that LONG_MAX on 64-bit machine is 0x7fffffffffffffff and not 0x7fffffff, and LONG_MAX gets truncated when returned as int.

If you try to run dlc on bits.c, it might choke on some header files included by limits.h on your system. There should be no need to use limits.h, so you may remove the line that says #include <limits.h>.