We've all been taught that when malloc returns 0, it means the machine ran out of memory. This case should be detected and "handled" by our application in some graceful manner. But what does "handled" mean here? How does an application recover from an out of memory (OOM) condition? And what about the increased code complexity of checking all those malloc return values and passing them around?

In this article I want to discuss the common policies of handling OOM conditions in C code. There is no single right approach. Therefore, I will review the code of several popular applications and libraries, to find out how they do it in order to gain useful insights for my own programming.

Note that I focus on desktop & server applications here, not embedded applications, which deserve an article of their own.

The policies

Casting minor variations aside, it's safe to say there are three major policies for handling OOM:

recovery

The recovery policy is the least commonly used because it's the most difficult to implement, and is highly domain-specific. This policy dictates that an application has to gracefully recover from an OOM condition. By "gracefully recover", we usually mean one or more of:

Release some resources and try again

Save the user's work and exit

Clean up temporary resources and exit

Recovery is hard. To be certain that your application recovers correctly, you must be sure that the steps it takes don't require any more dynamic memory allocation. This sometimes isn't feasible and always difficult to implement correctly. Since C has no exceptions, memory allocation errors should be carefully propagated to the point where they can be recovered from, and this sometimes means multiple levels of function calls.

abort

The abort policy is simple and familiar: when no memory is available, print a polite error message and exit (abort) the application. This is the most commonly used policy - most command-line tools and desktop applications use it.

As a matter of fact, this policy is so common that most Unix programs use a gnulib library function xmalloc instead of malloc:

segfault

The segfault policy is the most simplistic of all: don't check the return value of malloc at all. In case of OOM, a NULL pointer will get dereferenced, so the program will die in a segmentation fault.

If there are proponents to this policy, they'd probably say - "Why abort with an error message, when a segmentation fault would do? With a segfault, we can at least inspect the code dump and find out where the fault was".

Examples - libraries

In this section, I present the OOM policies of a couple of well-known libraries.

Glib

Glib is a cross platform utility library in C, used most notably for GTK+. At first sight, Glib's approach to memory allocation is flexible. It provides two functions (with several variations):

g_malloc: attempts to allocate memory and exits with an error if the allocation fails, using g_error[1]. This is the abort policy.

g_try_malloc: attempts to allocate memory and just returns NULL if that fails, without aborting.

This way, Glib leaves the programmer the choice - you can choose the policy. However, the story doesn't end here. What does Glib use for its own utilities? Let's check g_array for instance. Allocation of a new array is done by means of calling g_array_maybe_expand that uses g_realloc, which is implemented with the same abort policy as g_malloc - it aborts when the memory can't be allocated.

Curiously, Glib isn't consistent with this policy. Many modules use g_malloc, but a couple (such as the gfileutils module) use g_try_malloc and notify the caller on memory allocation errors.

So what do we have here? It seems that one of the most popular C libraries out there uses the abort policy of memory allocations. Take that into account when writing applications that make use of Glib - if you're planning some kind of graceful OOM recovery, you're out of luck.

SQLite

SQLite is an extremely popular and successful embedded database [2]. It is a good example to discuss, since high reliability is one of its declared goals.

SQLite's memory management scheme is very intricate. The user has several options for handling memory allocation:

A normal malloc-like scheme can be used

Allocation can be done from a static buffer that's pre-allocated at initialization

A debugging memory allocator can be used to debug memory problems (leaks, out-of-bounds conditions, and so on)

Finally, the user can provide his own allocation scheme

I'll examine the default allocation configuration, which is a normal system malloc. The SQLite wrapper for it, sqlite3MemMalloc defined in mem1.c is:

malloc is used to obtain the memory. Moreover, the size of the allocation is saved right in-front of the block. This is a common idiom for allocators that can report the size of blocks allocated when passed the pointers [3].

As you can see, the pointer obtained from malloc is returned. Hence, SQLite leaves it to the user to handle an OOM condition. This is obviously the recovery policy.

Examples - applications

OOM handling in a few relatively popular applications.

Git

Distributed version control is all the rage nowadays, and Linus Torvalds' Git is one of the most popular tools used in that domain.

When it runs out of memory, Git attempts to free resources and retries the allocation. This is an example of the recovery policy. If the allocation doesn't succeed even after releasing the resources, Git aborts.

lighttpd

Lighttpd is a popular web server, notable for its speed and low memory footprint.

There are no OOM checks in Lighttpd - it's using the segfault policy. Following are a few samples.

Redis

Redis is a key-value database that can store lists and sets as well as strings. It runs as a daemon and communicates with clients using TCP/IP.

Redis implements its own version of size-aware memory allocation function called zmalloc, which returns the value of malloc without aborting automatically when it's NULL. All the internal utility modules in Redis faithfully propagate a NULL from zmalloc up to the application layer. When the application layer detects a returned NULL, it calls the oom function which does the following:

/* Redis generally does not try to recover from out * of memory conditions when allocating objects or * strings, it is not clear if it will be possible * to report this condition to the client since the * networking layer itself is based on heap * allocation for send buffers, so we simply abort. * At least the code will be simpler to read... */staticvoidoom(constchar *msg) {
fprintf(stderr, "%s: Out of memory\n",msg);
fflush(stderr);
sleep(1);
abort();
}

Note the comment above this function [4]. It very clearly and honestly summarizes why the abort policy is usually the most logical one for applications.

Conclusion

In this article, the various OOM policies were explained, and many examples were shown from real-world libraries and applications. It is clear that not all tools, even the commonly used ones, are perfect in terms of OOM handling. But how should I write my code?

If you're writing a library, you most certainly should use the recovery policy. It's impolite at the least, and rendering your library unusable at worst, to abort or dump core in case of an OOM condition. Even if the application that includes your library isn't some high-reliability life-support controller, it may have ideas of its own for handling OOM (such as logging it somewhere central). A good library does not impose its style and idiosyncrasies on the calling application.

This makes the code a bit more difficult to write, though not by much. Library code is usually not very deeply nested, so there isn't a lot of error propagation up the calling stack to do.

For extra points, you can allow the application to specify the allocators and error handlers your library will use. This is a good approach for ultra-flexible, customize-me-to-the-death libraries like SQLite.

If you're writing an application, you have more choices. I'll be bold and say that if your application needs to be so reliable that it must recover from OOM in a graceful manner, you are probably a programmer too advanced to benefit from this article. Anyway, recovery techniques are out of scope here.

Otherwise, IMHO the abort policy is the best approach. Wrap your allocation functions with some wrapper that aborts on OOM - this will save you a lot of error checking code in your main logic. The wrapper does more: it provides a viable path to scale up in the future, if required. Perhaps when your application grows more complex you'll want some kind of gentle recovery like Git does - if all the allocations in your application go through a wrapper, the change will be very easy to implement.

A convenience function/macro to log an error message. Error messages are always fatal, resulting in a call to abort() to terminate the application. This function will result in a core dump; don't use it for errors you expect. Using this function indicates a bug in your program, i.e. an assertion failure.

Embedded in the sense that it can be embedded into other applications. Just link to the 500K DLL and use the convenient and powerful API - and you have a fast and robust database engine in your application.