Investigating Memory Leaks with Dtrace

This article shows a real case of DTrace framework usage to detect undeleted objects in a C++ application running on Solaris 10. In this case context, undeleted objects refer to temporary business objects that are explicitly created, with the new() operator, but never destroyed. This behavior, comparable in its effects to the so-called memory leak1, may lead to a significant unwanted increase in memory usage and cause paging activity on the system, or even generate new objects creation failures with applications which create objects iteratively.

Since the non-deletion of these business objects is not the result of bad pointers but rather of an incorrect cache management in the application, specialized memory-leaks tracking tools which look after allocated memory chunks-pointers inconsistencies do not detect this type of undeleted objects. For instance, Oracle Solaris Discovery tool2 or Oracle Solaris libumem audit facility3, as well as Rational Purify or gdb are ineffective in this situation4.

A new tool based on DTrace and perl scripts was developed to address this specific need and is usable with all programs that have iterative objects creation and deletion patterns similar to our case described below. The tool requires no binary change and is easy to use. It has demonstrated its efficiency at a customer site on a pre-production system in finding the leak in a couple of minutes, where the traditional methods failed after days of investigations.

The principle

Rather than building a fully automatized tool that would be highly dependent on the application and complex to write, we choose a more simple implementation that allowed to get results quickly. The choice was so to write generic scripts (independent of the application) with part of the program logic passed as arguments and a final manual analysis of the selected user stacks.

The principle of our method is to record the objects creation and deletion within the program into a file5 and then post-process this file to detect the mis-undeleted objects based on specific program logic data.

- The program itself implements the following iterative process:

The program is launched

An action (A1) (import or whatever process) is started. This initial action allocates temporary memory for itself and permanent memory for object that will exist all along the process life (the cache initialization for ex.). As those latter objects could appear as false positive, A1 is discarded from the scope of the analysis.

A second action (A2) is started. It is identical to A1 except that it allocated memory for itself only and free the temporary objects created in A1.

A third action (A3) , identical to A2, is started and, similarly to A2, it allocated memory for itself and free the temporary objects of the previous step (A2 so).

Since A2 and A3 actions are identical and use the same iterative object’s creation-deletion mechanism, objects created in A2 but not freed after A3 are the potential memory leaks we are looking for. This is illustrated in the fig. below where the letters 'b' and 'e' indicate the area where to search the memory leaks.

- The recording step is based on a dtrace script (watch-memory-usage.d) and contains no business logic, it merely traces the new() and delete() operators, records the user stack a timestamp and tags (iterator ids) at the time the probes are fired as described in the next section.

- The detection step is a postmortem process based on a perl script (findleaks.pl), also independent of the application, which analyzes the output file of the dtrace script and looks for objects (allocated in A2 and not freed after A3) in a given search area.

The full sequence of commands writes:

% a.out & // start the program

% sudo watch-memory-usage.d pid > leaks.txt

Launch action 1, then 2, then 3. Wait for a while between actions and note the launching time for each

Stop the dtrace with CTRL-C after 3

% cat leaks.txt | c++filt > leaks-dem.txt // demangle the output file

Locate in the file the time range between 1 and 2 and retrieve the appropriate sequence ids (tags)

% findleaks.pl -f leaks-dem.txt -b begin_id -e end_id

As noted before, the business logic is introduced manually into the perl script through the arguments -b and -e that delimit the search area. The actual implementation somewhat differ slightly from this sequence whose main interest is to detail the necessary steps of the process.

The dtrace script

The DTrace framework6 provides a set of kernel modules called providers, each of which performs dynamically a particular kind of instrumentation of the kernel or the application. The pid provider which allows to trace functions entry and return in user programs is the most appropriate provider to trace the new() and delete() operators7. Since DTrace instruments the excutable program in which the C++ function names are mangled, those mangled names must be used in the probes specifications, that is:

The arguments to entry probes are the values of the arguments to the traced function. The arguments to return probes are the offset in the function of the return instruction (arg0) and the return value (arg1).

Whenever an object is created, the script records it's size (arg0 on entry) and address (arg1 on return), the user stack (ustack()), a timestamp and an iterator id. When the object is deleted, the script records it's address (arg0 on entry) and other parameters. Finally, the aggregating array8 @mem[object's address] set to 1 when the object is created, and set to 0 when it is deleted, is printed when the script ends. This array will be used to find the undeleted objects in the postmortem analysis.

Finally, the output file must be demangled for the post-processing phase as shown above.

The perl script

The leaks-dem.txt file records demangled raw data from the dtrace script. It contains all the necessary info to sort out the memory leaks but contains no program logic info, such as the timestamps corresponding to the beginning of the 3 actions executed. Parsing the file (this is the main function of the perl script9) and using the hand-noted times when the actions were started allow to retrieve the appropriate sequence ids of these actions (the id is the first field of each new record in leaks-dem.txt). In our case, ids 2968 and 3511 correspond to the boundaries of action A2. The search object's satisfy the following conditions:

@mem[object_address] = 1 2968 ≤ object_id ≤ 3511

The perl script command line writes:

% findleaks.pl -f leaks-dem.txt -b 2968 -e 3511

and outputs a list of aggregated stack sorted by memory consumption, with the number of occurrences, corresponding to the potential leaks. That is:

1 A memory leak occurs when a computer program consumes memory but is unable to release it back to the operating system or to the application. However, many people refer to any unwanted increase in memory usage, because for instance of a wrong cache management, as a memory leak, though this is not strictly accurate.

2 Oracle Discovery tool is a new tool for memory checking, available in Solaris Studio 12 update 2.

3 Libumem is a library, first introduced in Solaris 9 Update 3, used to detect memory management bugs in applications. See http://blogs.sun.com/dlutz/entry/memory_leak_detection_with_libumem

4 Actually, Discovery reports memory blocks allocated on the heap but not released at program exit.

5 Recording into a file allows to overcome the memory size limit of the script

The scripts provided have been developed for the Solaris10 OS and might not work another platform (sol11 or Mac OS X for ex.). They are however easily adaptable by looking at the system calls probes and their arguments, and ustack().

Hello again,
This is is Vijay.
Thanks for the findleaks script anyway. It was really good to know.
I thought of changing this this so that we have all the memory leaks instead of leaks between a range.
So I thought of introducing one more perl script which will give the complete leaks in a simpler manner.

The logic is store the addresses as keys and the complete callstack as the value. If there is an address already existing then simply replace the callstack.This will make sure the callstack of last allocation is preserved for any address.

SO after the report is encountered I will take only the callstacks of those addresses where the counter is not 0(=> 0).
This way i can output only those memory address and their corresponding stacks whereever leaks are occured.
Below is the script i have written.

# Here we have all interesting events
# We have to sort new, and then to keep only the last for each address
# We store for each address the new() informations
my %last_new_at;
for my $seq_id (sort {$a<=>$b} keys %event_at){
my $address = $event_at{$seq_id}->{address};
$last_new_at{$address} = $event_at{$seq_id};
}

# Here we have in %last_new_at the latest allocations
# We can get the stacks and report statistics (count and sizes)
my %leaks_for;
for my $address (keys %last_new_at){
next if $last_new_at{$address}->{seq_id} < $opts{b}
or $last_new_at{$address}->{seq_id} > $opts{e};

I found the entire Software Tailor team to very helpful and knowledgeable. You help me understand the differences between the other systems on the market and the <a href="http://www.softwaretailor.com">software development HK</a>