While modern CPUs have far more registers, most of which are 32-bit or more, the concept of memory segmentation has not really changed all that much. Compiled programs still separate code from data from the stack, though the programmer may never see this. Nevertheless, understanding this separation and exactly how it works can be critical in debugging code, as well as writing portable applications.

Consider for a moment a program to be written on a dedicated system; for example, one to control all the automatic doors in an airport. Such programs generally read inputs from a number of sensors--in this case, they activate when someone approaches a door. The program must then respond to the input by activating some external device, a motor perhaps, to open the door. The program will normally run from ROM or an EPROM chip--these are cheap, and robust because the program stays in memory even when power is shut off. However, the program must store temporary data, including readings from the sensors, in writable memory--RAM. And so must the stack, as it is a writable object. In this situation, there is a true physical separation between program code and data. Any attempt to write data to the code segment, stored in a read-only medium, will fail. This was at least one early motivation for enforcing a strict separation between code and data.

The only real nomenclature change for contemporary systems is that the data segment, which is often the largest block of memory required, is called the heap. When malloc divvies up memory and returns it to your program, this memory comes off the heap. The compiled code still resides in its own section of memory, separate from the heap. The function of the stack has not changed in 30 years, either--its function is to pass arguments to subroutines, store the return address for subroutine calls, and store temporary values, such as the registers just before a CPU interrupt occurs. Often the stack and heap share the same memory space but, starting at opposite ends, grow toward each other. The stack has a preset size limit (sometimes customizable). It is possible to overflow the stack, which is almost always fatal to the program. In fact, buffer overflow security vulnerabilities caused by destroying the stack are the subject of many other articles.

How can writing to the code segment become a problem? Consider the following short C program:

It is simple enough to follow, and the expected output might be helloworld. However, run it in the debugger (in this case, on a Solaris machine) and examine the variables a bit more closely. Upon initialization, buf = 0xffbffa60 points to some random initial memory location--&buf = 0xffbffa58. That is, the actual character pointer variable is stored at this location in memory, which must be somewhere in the heap. The initial value of buf is not so random after all, but being in debug mode, this is to be expected.

Watch what happens after assigning buf the value hello, however. Now printing the contents of buf reveals that buf = 0x00020898--not in the heap, but way on the other end of memory in the code segment! This makes sense, because, as the next article in the series will show, the code segment actually stores constants such as hello. The code here made buf to point to it. So far so good; nothing bad has happened ... yet. The code then prints out buf, and again there is no trouble here.

The code then copies the next word to print into buf. Did you catch it? Granted, this certainly is not the recommended way to code, but the second word is the same length as the first, so there are no troubles in terms of writing past the end of the buffer--just remember the address buf contains. The strcpy command attempts to write 5 bytes to 0x00020898, an address in the code segment--which is potentially ROM! This is a big no-no. If this code were really stored in a ROM chip what would happen? The strcpy would effectively be a nonoperation, because writes to read-only memory make no sense, and the succeeding printf would print hello a second time.

Of course, you should always allocate string buffers in C as character arrays of a fixed size (if known) or else dynamically with malloc, never as shown in the previous example. That code is merely to illustrate how a seemingly correct program can produce unexpected results. A corrected program might be:

top and ulimit

The most universal method for checking memory usage in Unix is probably top. This useful utility is available on most operating systems, and displays several important statistics about each process running on the machine. The memory-related ones are specifically RSS/RES and SIZE (though these names may vary slightly). The RSS column stands for resident set size, and indicates the amount of physical RAM being used by the program at a given time. SIZE, on the other hand, gives the total RAM usage, including code, heap, and stack. When the numbers differ, the missing data is actually stored in swap memory, on disk. This is usually memory that has gone unaccessed for a while and that will not consume physical memory. Normally the OS decides automatically which objects to swap in and out of virtual memory. As mentioned earlier, freed memory does not return to the heap until a program terminates, and as a result, memory use indicated by SIZE will never decrease throughout the extent of a program. It only ever goes one way--up.

A handy method for controlling memory use by processes under Unix is the ulimit command. With it you can set the maximum size of a process's data segment, RSS, open files, stack size, and more. You can specify hard limits (which you cannot change once set) and soft limits (adjustable up to a maximum value determined by the hard limit). Limiting memory usage can help avoid runaway processes from eating up memory, and it is often a good safeguard to have while debugging a program.

It may be useful to increase the stack size if running a highly recursive algorithm, because each recursive call pushes more data onto the stack. Another good time to use ulimit is when running a web server, where a CGI process may run hundreds of times, once for each connection allowed by the server. In this case, it is wise to limit per-process memory usage to avoid accidentally exhausting the memory of the server. Set the limit to the amount of RAM in the server divided by maximum number of connections permitted by the server, leaving some RAM for other processes too. If this limit is less than the peak RSS of an active process, reduce the number of connections permitted to the web server accordingly.

A useful way to reduce the memory footprint of a CGI is to use a technique such as streaming or chunking. The exact implementation may vary depending on the task, but essentially this entails breaking the problem into smaller pieces, working on one piece at a time, and freeing each piece before the next is started. In this way, you only ever need enough memory to hold a single piece. A simple example would be a CGI that needs to read a 100MB file from disk and return it to the user. Instead of reading it all in at once, then writing it to stdout using a full 100MB of RAM, the program can read 1MB at a time, write this to stdout, and free it before reading the next chunk. This limits the memory usage to only 1MB. The size of the chunks is usually adjustable, giving complete and flexible control over how much memory each process will require, while not limiting the size of data that the program can manage.

Summary

This article has discussed the fundamentals of memory management in PCs, and how they have changed over the past 30 years since the inception of the microcomputer. While memory capacity has increased drastically, the need to manage its use efficiently has not changed. We have simply started applying computers to much bigger problems to take advantage of all the available RAM.

To understand how to detect and remedy memory management issues, it is necessary to understand the storage of code and data in memory. The next article in the series will cover the storage and representation of variables in memory and will show how to use this knowledge to identify certain errors that are extremely difficult to debug without such knowledge. It will also cover swap memory, shared memory, and memory leaks in detail.

Howard Feldman
is a research scientist at the Chemical Computing Group in Montreal, Quebec.