Dynamic Memory Allocation Performance

In many real systems, size of memory and objects are not known and can not be determined statically and is mainly dependent on data set during run time. For example, a compiler for a programming language will maintain symbol tables and type information which is dynamically constructed by reading the source program where source program can be of any size. In fact, there are very few large programs (or none), not having dynamic memory allocation. Dynamically created data structures like trees, linked lists and hash tables (which can be implemented as arrays of linked lists) are key to the construction of many large software systems.

(Please note: HW implementation may have dynamic memory allocation as well but memory usage has to be calculated before hand because in HW, supporting true dynamic memory allocation is not possible. HW based Dynamic Memory Allocation will be explained in another article. Please visit Linked List Implementation in ASIC series for detailed implementation details.)

This article is performance comparison between various allocation schemes. Aim is not to provide comrehensive review/information on Dynamic Memory Allocation but to make reader aware of performance loss, resulting because of Dynamic Memory Allocation. Performance of following schemes are evaluated

Memory allocation provides a way to dynamically create buffers and arrays. Dynamic means that the space is allocated in memory as the program is executing. On many occasions the sizes of objects will not be known until run time. For instance, the length of a string a user inputs will not be known prior to execution. The size of an array may depend on parameters unknown until program execution. Certain data structures such as linked lists utilize dynamic memory allocation.

In many cases large amounts of dynamically allocated memory is consumed by interconnected objects which are not themselves very large. The time consumed allocating objects can be minimized, but is unavoidable. A significant amount of processing time can also be consumed traversing the dynamic data structures and returning them to the system.

Each memory allocation/de-allocation request in software is very costly and significantly impacts performance. Many methods are proposed to reduce this performance loss. One technique is to use a pool based memory allocator. This pool based memory allocator can be made part of library itself to maintain the transparency. Compiler can also play tricks by rescheduling and lumping allocation and de-allocation requests together to improve performance. There are few high performance compilers available out there performing these tricks.

A pool based memory allocator allocates large blocks of memory and then allocates smaller objects from these blocks. When it is time to recover memory, the entire pool is deallocated at once. This usually involves returning only a few large memory blocks to the system. This greatly reduces the time consumed by memory recovery.

malloc – The standard library function is commonly used to allocate memory. Malloc returns a void pointer to the allocated buffer. This pointer must be cast into the proper type to access the data to be stored in the buffer.

free – When using dynamically allocated memory, it is necessary for the programmer to free the memory after its use.

C++ Memory Manipulation Functions

new/delete are standard C++ functions to perform dynamic memory allocation. These are very easy to use and provides better readability compared to C counterpart functions (malloc/free). But performance loss is significant if C++ functions are used. Comparison is shown later.

Simple Record Collapsing Scheme

This mechanism is collapsing similar data types in one big array and in turn, save number of allocation/de-allocation calls to operating system. This has to be done at the time of software architecture development. A typical programming is shown and used to show performance gain. Actually this one is the most efficient of all schemes and really emphasizes the fact that performance gain resulting from good architecture normally outweighs lower level tweaks. However, It may not be feasible to use this scheme in real systems where significant amount of development is already done. Also sometime application itself may require more flexibility.

Library Function for Pool Based Memory Allocation Scheme

This mechanism is independent of data type and a simple function library implementation is provided to show performance gain/loss. Basic idea is to allocate big chunk of memory in a function which is equivalent to malloc (malloceq ) but functionally is super set of malloc. Whenever malloceq function call is performed, memory is assigned from already allocated memory pool and when there is no space left in the memory pool, new memory pool is allocated. Aim is to explain Pool Based memory allocation scheme by example and example function library (malloceq/freeeq) is implemented in ANSI C and usage is shown in this article. Also please note that implementation at lower level library will result in better performance gain.

Poor performance. Deallocation of a large data set, one element at a time, can be very time consuming and can have a large impact on program performance. This can be avoided using a block based memory allocator, where the elements are allocated from large memory blocks.

Memory leaks. A memory leak takes place when memory is allocated but never deallocated. Tools like purify or Boundchecker can be used to track and fix this issue.

Pointers to deallocated memory. When memory is deallocated there may still be pointers in use to the deallocated memory. This may cause unexpected behavior at run time.

Even carefully crafted code written by experienced software engineers tends to suffer from problems with memory leaks and references to deallocated memory.