Navigation

with a tingly feeling in my belly, I’m happy to announce heaptrack, a heap memory profiler for Linux. Over the last couple of months I’ve worked on this new tool in my free time. What started as a “what if” experiment quickly became such a promising tool that I couldn’t stop working on it, at the cost of neglecting my physics masters thesis (who needs that anyways, eh?). In the following, I’ll show you how to use this tool, and why you should start using it.

A faster Massif?

Massif, from the Valgrind suite, is an invaluable tool for me. Paired with my Massif-Visualizer, I found and fixed many problems in applications that lead to excessive heap memory consumption. There are some issues with Massif though:

It is relatively slow. Especially on multi-threaded applications the overhead is large, as Valgrind serializes the code execution. In the end, this sometimes prevents one from using Massif altogether, as running an application for hours is unpractical. I know that we at KDAB sometimes had to resort to over-night or even over-weekend Massif sessions in the hope to analyze elusive heap memory consumption issues.

It is not easy to use. Sure, running valgrind --tool=massif <your app> is simple, but most of the time, the resulting data will be too coarse. Frequently, one has to play around to find the correct parameters to pass to --depth, --detailed-freq and --max-snapshots. Paired with the above, this is cumbersome. Oh and don’t forget to pass --smc-check=all-non-file when your application uses a JIT engine internally. Forget that, and your Massif session will abort eventually.

The output is only written at the end. When you try to debug an issue that takes a long time to show up, it would be useful to regularly inspect the current Massif data. Maybe the problem is already apparent and we can stop the debug session? With Massif, this is not an option, as it only writes the output data at the end, when the debugee stops.

With these issues in mind, I often wondered whether there isn’t a better alternative. To track the heap memory consumption, all we need to track are the calls to the allocation functions like malloc and free. The rest of what Valgrind is doing is not required, so shouldn’t it be possible to write a custom tracker with the help of the LD_PRELOAD trick which solves the issues above? But we need to get backtraces, and that quickly, as malloc & friends are often called extremely often. How is that possible?

The Shoulders of Giants

For a long time, I did not know any solution to the backtrace problem. But early this year, a colleague of mine told me that vogl also uses the LD_PRELOAD trick to overload the OpenGL functions and has the ability to grab backtraces. Apparently it was also quite efficient, so I had a look at what its doing and indeed, there I found my holy grail: libunwind, paired with a patched libbacktrace and some (to me) esoteric Linux C APIs. Combined, this makes it possible to efficiently grab backtraces with libunwind and delay the DWARF debug symbol interpretation until a later time. Without the example code in vogl, I’d never come up with this - so many thanks to Valve for releasing the source code on GitHub!

Introducing heaptrack

From here on, the rest was mostly plumbing. heaptrack consists of five parts:

libheaptrack_preload.so: The shared library that is injected into the debugee application using the LD_PRELOAD trick. It overloads malloc & friends, grabs a backtrace of raw instruction pointers with unw_backtrace and writes it all to file specified via the DUMP_HEAPTRACK_OUTPUT environment variable. Additionally, dlopen and dlclose are overwritten and trigger the collection of runtime information on shared libraries, which is required to later translate the instruction pointer addresses with DWARF debug information. Finally, a timer is also started with allows us to correlate allocations and memory consumption with real time.

libheaptrack_inject.so: Similar to the preload variant, this library is used for runtime-attachement to an existing process. Frequently, I found myself wondering why suddenly an application’s heap memory consumption increases. Neither massif, nor any other tool I know of, can do runtime-attaching, but heaptrack can now do this!

heaptrack_interpret: This process reads the output of libheaptrack.so over stdin and annotates the instruction pointer addresses with DWARF debug symbols with the help of libbacktrace. The annotated data stream is then sent to stdout. I recommend gzip‘ing it to save some disk space, as the data files can easily consume hundreds of megabytes otherwise. The resulting data file is then “final”, meaning you can transfer it to any other machine as no further processing is required that is machine dependent.

heaptrack: To simplify the process, there is a small shell script which combines the first two tools. It launches the arguments passed to it as a process with the correct LD_PRELOAD environment. The output of libheaptrack.so is directly transmitted to a heaptrack_interpret process with the help of mkfifo. And the heaptrack_interpret output finally is compressed on the fly and stored to disk. This is the tool you want to use:

$ heaptrack yourapp [your arguments...]

starting application, this might take some time...

output will be written to /home/milian/heaptrack.yourapp.12345.gz

...

Heaptrack finished! Now run the following to investigate the data:

heaptrack_print /home/milian/heaptrack.yourapp.12345.gz | less

heaptrack_print: Similar to ms_print, this process analyzes the output of heaptrack_interpret. It has many features, which I’ll outline below. You can run it at any time on the output file that heaptrack creates, and it supports transparent decompression of gzip‘ed files. The output is written directly to the CLI, which is often cumbersome to interpret. I plan to work on a proper heaptrack-visualizer in the future.

The temporary file format of the libheaptrack.so output, as well as the permanent one by heaptrack_interpreted is currently undocumented. It’s plain text though and should be easy to decipher, esp. with the source code at hand.

Note that heaptrack, contrary to Massif, does not do any aggregation of the data. It only minimizes the data files by not printing the same backtrace information repeatedly. But each individual malloc or free call, together with the function arguments, will be tracked. This allows some extremely interesting insights into the heap usage of a debugee, as we can later analyze the data to find all of the following:

heap memory consumption: this is what Massif does, and often the most interesting

number of calls to allocation functions: usually you’d need a profiler like Valgrinds callgrind to figure out where you frequently allocate memory. Heaptrack gives you that information as well, and much quicker. I used this data already in many places to get rid of temporary memory allocations. This is extremely worthwhile, as not only are memory allocations relatively slow, your performance also benefits from “secondary” effects: when you reuse memory, the chances are much higher that it is cached already, and cache-misses are often the biggest slow-down of current applications.

total amount of memory allocated, ignoring deallocations: Not so useful, but sometimes interesting and nicely accompanies the call count data to find temporary memory allocations

leaked memory: Even without the fancy analysis of Valgrind’s memcheck tool to distinguish between still reachable, possible and definitely lost memory, heaptrack can give you a quick look at what memory has not been freed when the debugee stopped.

histogram of allocation sizes over the number of calls: So far one can only

…: Your ideas are welcome - I’m confident that many more insights can be found from heaptracks data.

NOTE: Just like other profilers and tools, heaptrack relies on the DWARF debug information in your application. If you try to analyze a stripped release build without debug symbols, you’ll have a hard time making sense of it.

Using heaptrack_print

Assume we have run heaptrack on an application and now want to evaluate the obtained data. heaptrack_print is the tool to do that, but it’s relatively cumbersome to use (plain ASCII output, not even an ncurses GUI!). Thus, I explain the output here, such that you can make sense of it. Do take a look at the --help output as well.

Calls to Allocation Functions

Enabled by default, disable via -a / --print-allocators 0.

The output below the MOST CALLS TO ALLOCATION FUNCTIONS header is a list of the top 10 locations that call memory allocation functions. The format, by default, is merged, e.g., for code similar to this:

void asdf() { new int; }

void bar() { asdf(); }

void laaa() { bar(); asdf(); }

will produce output like this, when laa() is called ten times from main():

MOST CALLS TO ALLOCATION FUNCTIONS

11 calls to allocation functions with 44B peak consumption from

asdf()

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:24

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

10 calls with 40B peak consumption from:

bar()

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:36

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

laaa()

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:41

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

main

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:103

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

1 calls with 4B peak consumption from:

bar()

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:36

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

laaa()

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:41

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

main

at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:105

in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp

...

Here, the backtraces are merged on the location of the new int allocation in asdf(), and all sub-traces are displayed beneath. Since heaptrack_print sorts the data, you can just read its output from the top to find the top 10 hotspots of allocation functions. You can disable backtrace merging with -m / --merge-backtraces 0.

Peak Memory Consumption

Enabled by default, disable with -p / --print-peaks 0.

To decrease your memory consumption, you need to decrease the peak memory consumption. Under the PEAK MEMORY CONSUMERS caption, heaptrack_print shows the top ten hotspots, sorted by the peak size in bytes. This can look e.g. like this:

PEAK MEMORY CONSUMERS

3.98MB peak memory consumed over 37473 calls from

QString::realloc(int)

in /usr/lib/libQtCore.so.4

1.04MB over 4 calls from:

QString::append(QString const&)

in /usr/lib/libQtCore.so.4

0x7fa9ce54bf73

in /usr/lib/libQtCore.so.4

0x7fa9ce54c5ee

in /usr/lib/libQtCore.so.4

QTextStream::readAll()

in /usr/lib/libQtCore.so.4

Kate::Script::readFile(QString const&, QString&)

at /ssd/milian/projects/kde4/kate/part/script/katescripthelpers.cpp:82

in /ssd/milian/projects/compiled/kde4/lib/libkatepartinterfaces.so.4

Kate::Script::require(QScriptContext*, QScriptEngine*)

at /ssd/milian/projects/kde4/kate/part/script/katescripthelpers.cpp:289

in /ssd/milian/projects/compiled/kde4/lib/libkatepartinterfaces.so.4

0x7fa9bac7d228

in /usr/lib/libQtScript.so.4

...

Massif Compatibility

heaptrack_print, since yesterday, also supports converting the heaptrack data to the Massif file format. This can then be visualized with my Massif-Visualizer. The resulting files are relatively large as much more detailed snapshots are included. I optimized the visualizer a bit as well to speed up the evaluation of these files. It is worth it though! Since the time axis uses real time, it is much easier to correlate to the actual runtime behavior of your application (note: you can configure Massif to also use “real time”, but due to its high overhead, the results are still confusing and not much different to the instruction count). The higher level of detail also makes it simpler to interpret the results. Note though, that the converter currently has no code to ensure the peak is not missed, which can be seen in the images below. I plan to add this eventually.

heaptrackMassifComparison of heaptrack and Massif on the same work load shows the much higher level of detail. Overall, the results are compatible, but note that heaptrack uses real time whereas Massif defaults to instruction count for the abscissa time axis. Also, the Massif file generated by heaptrack currently misses the peak, which is accurately tracked by Massif.

Memory Leaks

Disabled by default, enable with -l, --print-leaks 1.

The leaks reported by heaptrack are simply all calls to malloc & friends which where never free‘ed afterwards. It is not possible to do a “still reachable” or “possibly lost” analysis as Valgrind’s memcheck tool does. Still, it is often quite helpful. Note though that it does not support suppression files, which is crucial here as otherwise you’ll often see leaks reported inside libc and other external libraries which are often intentional.

Size Histogram

Disabled by default, enable by passing an output file to -H, --print-histogram.

The size histogram gives an insight into whether you potentially could benefit from a pool allocator or similar optimization technique. heaptrack_print just writes the raw data to the output file you specify. With octave or gnuplot, you can then evaluate this manually, yielding a graph such as the following:

Note how many allocations below 8 byte are done by this application. All of these waste memory space, as the value itself could easily be stored in the space required for a single pointer on a 64 bit machine. For those interested, most of these allocations come from small strings, since Qt’s QString class has no small-string optimization (yet, planned for Qt 6). In the future, the heaptrack data could be analyzed such that it directly points you to the culprits of such memory wastes.

Try it out

So far, I developed this tool mostly to scratch my own itch. I demoed it to some colleagues, but until yesterday, some essential features where missing. Now, I think, it is ready for a wider audience. If you are interested, try it out - I’m interested in your feedback:

git clone git://anongit.kde.org/heaptrack

cd heaptrack

mkdir build

cd build

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

make install

This should be all that is required to get heaptrack up and running. It depends on Boost (for heaptrack_print and heaptrack_interpret) and a recent libunwind (for libheaptrack.so). If in doubt, compile libunwind also from source, as I fixed one significant performance issue there on my platform. Thus, if heaptrack is extremely slow, please try to update libunwind first. Also note that I have another patch for libunwind in the pipeline to increase the DWARF cache, which improves the runtime performance of heaptrack even futher.

Furthermore I took the liberty of leveraging C++11 features wherever I needed it. You will need a recent compiler to build heaptrack. CMake should tell you if your compiler is too old.

Also note again that this tool currently only works on Linux. With some work, it might be able to port it to other Unixoid platforms. Personally, I won’t spent time on this, as it is not worth it for me. I develop cross-platform Qt applications, and can thus easily investigate the memory consumption on a Linux host.

Platform wise, I build and tested this code only on X86-64 platforms. I hope it also works fine on 32Bit x86, as well as ARM, but I’ll have to test it.

A note on Performance

CPU Overhead

I do not have any reliable benchmark, but I still want to share some rough estimates on the overhead of heaptrack and compare it with Massif. In heaptrakcs source tree, you can find e.g. tests/threaded.cpp, which allocates and frees memory repeatedly and in parallel with multiple threads. With perf stat, we can estimate the worst-case overhead of heaptrack with this test:

Baseline

$ perf stat -r 5 ./tests/threaded

Performance counter stats for './tests/threaded' (5 runs):

147.544073 task-clock (msec) # 1.736 CPUs utilized ( +- 6.18% )

563 context-switches # 0.004 M/sec ( +- 6.12% )

735 cpu-migrations # 0.005 M/sec ( +- 0.78% )

910 page-faults # 0.006 M/sec ( +- 4.38% )

235,074,081 cycles # 1.593 GHz ( +- 10.63% ) [71.15%]

<not supported> stalled-cycles-frontend

<not supported> stalled-cycles-backend

156,034,336 instructions # 0.66 insns per cycle ( +- 6.64% ) [91.21%]

35,155,936 branches # 238.274 M/sec ( +- 3.27% ) [90.29%]

366,564 branch-misses # 1.04% of all branches ( +- 8.44% ) [86.60%]

0.084972509 seconds time elapsed ( +- 6.20% )

Averaged over five runs, this test finishes in less than 100ms and roughly 150 million instructions are executed.

With heaptrack, the test application runs considerably slower. According to perf stat, it is approximately ~12 times slower.
Furthermore, we are not executing ca. 5.4 billion instructions., have many more page-faults etc. pp.

With Massif, the situation is even worse. It synchronizes all threads, as evidenced by the task-clock report which shows that only once CPU is utilized. Overall, it is roughly 2.5 times slower than heaptrack and also executes nearly twice as many instructions.

This result is quite promising in favor of heaptrack. In many other tests, the test applications also feel much more fluent when running under heaptrack compared to Massif. But YMMV and take this with a grain of salt.

Memory Overhead

Be also aware that heaptrack not only slows down your application, but also adds a considerable memory overhead, both in-process (libheaptrack.so) as well as out-of-process (heaptrack_interpret). In a non-scientific measurement of the memory consumption of kwrite showing a medium sized text file, I acquired the following numbers for the total memory used after the file is loaded:

Baseline: 26.1MB

heaptrack: 39.3MB + 19.2MB = 58.5MB

Massif: 264.9MB

So again, heaptrack seems to be significantly leaner compared to Massif, but YMMV.

What’s left to do?

I will probably not spent much more time on heaptrack in the coming months, but rather hope to finally be able to concentrate on finishing my studies. Mid-term next year, after a long vacation, I then plan to start working on the following (if noone beat me to it until then):

do a proper release: I plan to move this tool to KDE’s extragear and undergo a code review. Once that is settled, I will release a first version and hope for packagers to distribute it.

heaptrack-gui: Generating massif.out files and looking at them in my Massif-Visualizer is nice, but inefficient and only shows a fraction of the data we have available. Thus, a proper GUI application is required to show all of the data in a heaptrack output file. Additionally, it could visualize the data as it comes in, giving you the ability to track the heap behavior of your application in real-time!

public API: heaptrack does not support custom allocators yet. To support this, a simple API could be added similar to Valgrind’s Client Request API.

I/O profiling etc.: The technique used for heap profiling can also be used to profile I/O, mutex lock contention and more.

Note that stack memory consumption cannot be profiled in this way. Use Massif for that, if you need to look at that.

Reinventing the Wheel

Initially, I thought heaptrack is unique in what it does. Over the time I realized that this is not quite the case. Google’s gperftools has a similar tool, and there is libmemusage.so and many others like it. Thankfully, none of them gives as much data as heaptrack, while still being efficient. So my time was not wasted. And I learned a ton in the process. I invite everyone to inspect my code and give suggestions. It is so far only about ~1.6kloc of code, but probably a bit lacking on the documentation side. I’ll improve this over time, I hope.

I also tried to implement this tool with perf probe, but could not get it to work reliably. The perf script support is still lacking the ability to run native code, which is crucial here to get high performance. Additionally, perf requires root access in order to use user-space probes on e.g. malloc and friends in libc.so. This is not practicable - heaptrack and LD_PRELOAD work as-is just fine.

Thanks

To wrap up this lengthy blog post, I want to express my deepest gratitude again to all those who made this tool possible. In no particular order:

Julian Seward and the Valgrind team: This tool suite will always remain invaluable to me. Without it, I’d never been able to cross-check the results of heaptrack reliably. For me, while I’ll probably use Massif less and less, I will still be use it to verify that the results obtained by heaptrack are correct. And the error-checking tools in the Valgrind suite, like memcheck, helgrind or drd are still unmatched in their quality.

Michael Sartain, Peter Lohrman and Valve: Without the code in vogl, I’d still be out on the hunt for an efficient scheme to obtain backtraces, or would be clueless how to translate the instruction pointer addresses with DWARF debug symbols.

Arun Sharma, Lassi Tuura and the libunwind team: The core tool to actually get the backtrace. Many thanks for this fast, easy-to-use library!

GCC team: Not only an excellent compiler, but also the source of the libbacktrace library, which does the heave ELF/DWARF heavy lifting to translate raw instruction pointer addresses.

My colleagues at KDAB: Fruitful discussions with them lead to the solution for many of my problems over the last months. And thanks to the pre-alpha testing!

Ive been observing some “<unresolved function> in ??”. With valgrind these functions seem to be resolved (i.e. I know which are they). And then some of these functions are shown quite differently with massif-visualize (interleaving within each other) ?

Is there a way to show even more details, then now are displayed (e.g. Caller and Callee information, i.e. who has called what) ?

Can you report this issue about the unresolved functions on the bug tracker please? https://bugs.kde.org/enter_bug.cgi?product=Heaptrack If you want me to fix it eventually, you will have to provide me a way to reproduce the issue though. Also, if possible, attach the heaptrack and massif reports so I can see what you are seeing.

Current builds of the GUI from the 1.0/master branches show you all that is implemented so far. Adding lists of callers and callees to the caller/callee model is on my TODO, but not yet done.

4 0x00238ff4 in ?? () from /lib/tls/libc.so.6

5 0x00000000 in ?? ()

Please report a bug on bugs.kde.org (use the heaptrack component). Make sure to supply a reproducible test case for me (e.g. your application). Also get a backtrace when it hangs by attaching GDB to your application and obtaining a backtrace then.

I have now extended the README.md documentation and made the dependency on KChart optional. Additionally, I’ll try to make an official first release this week, which should bring heaptrack into various distributions official packages then.

Millian,
First, thanks for the great work on this and massif-visualizer.

Not sure if it’s obvious to everyone what the dependent packages and libraries are, and where to get them, but it took me on and off most of yesterday to figure out how to build the gui. But now that the gui is built, I must say it’s worth the time, especially since my program has significant amount of heap usage.

WARNING - personal rant below.
<rant>
I didn’t know what ECM was (confusion with ecm CD image compression tool); I had to track down the Qt and KF dependencies one by one through the ubuntu apt repos; I missed he piece of information about KChart being called kdiagram now and it not being part of KDE office, and KDE office not being part of ubuntu repos even for kubuntu.
</rant>

I would like to request either
1. a ubuntu package for heaptrack or a ppa, or
2. clear, step by step installation instruction and list of prerequisites.

A longer term “nice to have” would be to remove the kdiagram dependency - seems a significant requirement to have to install significant KF5 and KDE modules just to get charting, especially for a gnome user…

In the interest of saving future user’s time, below are the dependencies I found when trying to compile this on ubuntu 16.04. It might be useful for the ubuntu 16.04 installation instruction.

This sounds like you are trying to attach to an application that does not link against libdl. I just found https://github.com/gaffe23/linux-inject which uses the internal __libc_dlopen_mode symbol instead. Sounds like a good alternative to my approach currently, I have just pushed a commit that fixes this. Can you try again?

How does tcmalloc intercept operator new? In my tests with the system malloc, operator new still calls malloc internally. If that is not happening for tcmalloc, something else must be found to make it work.

I suggest you set a breakpoint before an operator new in a tcmalloc application and then step to follow along the instructions that get executed. That should give us a picture to figure out what would be necessary here.

I haven’t tried it, but I don’t think you need to do anything special as long as you still use malloc + free. If you on the other hand use some kind of tcmalloc special API directly, then heaptrack won’t intercept these calls.

I really like your visualization with heaptrack_gui. Heaptrack traces the whole amount of memory. But I have allocations and deallocations in a short time interval. Heaptrack_gui did not visualize them as precisely as it would be necessary in my application.
Could you provide me a solution for this problem.

Can you clarify your problem, maybe by pasting an example source file that reproduces your issue to a source code hoster? Then please explain me what you are seeing and how you would like to see it instead.

Do you want to zoom in (and filter on) a selectable time frame? This is not yet implemented, but high up on my todo list. Please stay tuned.

Then, when you say “in a short time interval” - do you have trouble with the 1ms time resolution in heaptrack? Would it be enough to be able to zoom in and then show the evolution of the heap memory in that time frame assuming monotonic increasing time? This should be doable, but we first need the normal zoom/filter functionality above implemented. Once we have that, adding such a feature could be doable, but I don’t want to output a timestamp together with every allocation/deallocation, as that would drastically increase the size of the output file and also increases the heaptrack overhead considerably.

We could also think about making the time resolution configurable, which sounds like a good idea in general… I’ll think about that

Thank you for your answer.
Please see the attached code example. It underlines the existing problem.
I want to see the first peek of the consumed memory, like I can see if I uncomment the breaks. It seems to be similar to your question. I have problems with the 1ms time resolution of heaptrack_gui.
Is it possible to modify the resolution of heaptrack_gui? I think heaptrack tracks the whole memory usage, independent from the time intervall of the allocations in this example.

OK, got it. Resolving this graphically is not implemented yet, but doable. As I said above: While all allocations are traced, they are not directly correlated to a timestamp, but rather a timestamp is outputted regularly (every 1ms). The allocations in that region are aggregated and only the peak is reported. In your case you’ll see the two allocations of 512 Bytes, but the graphs don’t show the saw-pattern for that as the time resolution is not fine-grained enough. Personally, I did not need such high-resolution graphs yet, which is why I haven’t yet implemented this - the total peak is much more interesting to me, and that is accurately resembled in the graphs.

Stay tuned, eventually we can get there. If you really really need this now then you’ll have to hack libheaptrack.cpp to call writeTimestamp() from handle{Malloc,Free}() before they write their output, but you’ll also need to change writeTimestamp() to increase the time resolution to nanoseconds or similar. You should then be able to load the generated (and much bigger) output file into heaptrack_gui, I think. Note that the time resolution change of course needs to be changed there as well, see e.g. TimeAxis::customizedLabel, which currently divdes by 1000 to convert ms to s.

sounds like a temporary network glitch, maybe in one of the KDE git mirrors. Please try again. I just tested it and it does work for me right now… Worst-case you can also refer to the Github mirror at https://github.com/kde/heaptrack

I really enjoy using heaptrack, thanks a lot for it! I am currently looking at “memory leak” problems where the memory is actually wrapped safely in shared_ptrs but somehow the shared_ptrs never go out of scope and thus the memory is not freed. Is it possible to find out which reference is still around using heaptrack somehow? By exchanging ref/deref functions?

But I’m afraid to tell you that currently I have no plan how to extend heaptrack to track reference counting bugs. I completely agree that it would be really nice to have in, but I think it would need a separate tool, or a separate view in the gui, to display these issues. And of course first we must get the data… Overloading C++ symbols via LD_PRELOAD is possible 1, albeit ugly, and also only works for non-inline exported symbols afaik. None of this is probably true for the shared_ptr refing code… What would be possible of course is to custom-patch the STL header to call an extern C function for every ref/deref, then hijack that in heaptrack. It’s doable, but lots of work of course.

If you are interested , I’d certainly be available to mentor you to write a reftrack tool which leverages a large part of the existing heaptrack code to grep backtraces etc. pp.

I could probably use template specialization or the fact that all shared pointers shared_ptr<foo> are actually typedef as fooPtr in this code to trigger some special function as well. But there is no easy way to request a stacktrace dump from heaptrack I guess?

I have a program that allocates space for large data sets processes them and deletes them. It also keeps some small amount of data in memory about files that have been processed.
What I think is happening is that some of the meta data is being allocated at the top of the heap which means that when the large data sets are deleted the heap cannot shrink.

Does HeapTrack (or any other heap analysis tool) have the ability to help me find which allocations are at the top of the heap and preventing freed memory being returned to the system?

One could add such features, but as of now, no - heaptrack won’t easily answer this question. I do hope to add this capability in the future though, but it may be a bit problematic to access the data about loaded pages efficiently, maybe parsing /proc/<pid>/smaps for every timestamp or some such…. Anyhow, some notes to your problem, or how I understand it:

a) Most malloc implementations will use separate pages for large allocations and keep small allocations out of such regions. Thus once the large area is freed, it should be handed back to the kernel. If I understood you correctly, you have a pool allocator allocating large chunks, right? Once you freed that large chunk, it should thus be freed again.
b) If you have small allocations otoh, and lots of them, you can easily run into https://sourceware.org/bugzilla/show_bug.cgi?id=14827. Can you try to add a call to malloc_trim(0) to your code - does that have a significant affect?
c) If none of this helps, have a look at the output from malloc_info, it will output metrics about the different arenas, and what it thinks is fragmented etc. pp. That should answer your question.

Is it possible to elaborate compilation of gui part? I have tried to compile it on OpenSuse Leap 42.1, failed miserably while compiling kdiagram, kcoreaddons, ki18n, … etc. dependencies.
Also would it be possible to compile gui part on Ubuntu LTS 14.04? Thanks a lot.

Hm I’ve never seen this issue. My only guess would be that you are mixing ABIs, could you compare the output of file on the libheaptrack_preload.so.1.0.0 and on your application binary? Maybe it’s a mixture of 32bit and 64bit?

Is the —massif option to heaptrack_print supposed to be so slow? I have a 7.2 MB heaptrack .gz output file, and so far heaptrack_print has been chugging on it for 17 minutes and counting. (It seems that judging from the file offset, it’s about 2/3 there.)

heaptrack_gui was much faster, yes. Unfortunately it needed something like 400 MB of KDE dependencies for showing a graph and a tabview :-) And even though load time is no longer measured in hours, it’s still measured in minutes. (I had a run that required 20 minutes or so.)

Perhaps it would be faster if the heaptrack format was in binary instead of requiring repeated regex application for parsing?

can you tell me what dependencies make up the 400MB? I wouldn’t expect such a large size impact for the few frameworks I use. Maybe you installed the full KF5/plasma environment? That is not required, only these KF5 packages and their dependencies and devel packages are required:

CoreAddons I18n ItemModels ThreadWeaver ConfigWidgets

For the charts you additionally need KDiagram/KChart which also only has minimal Qt 5 dependencies. On Arch, this amounts to less than 200MB, including Qt 5 and its dependencies. Excluding Qt 5, it’s just ~20MB.

Regarding regex parsing: I’m not doing any regular expression parsing anywhere. Why did you think I do that?

KDiagram does not have minimal dependencies at all, really. I installed all the -dev packages I needed to compile it (on Debian); that’s where the 400 MB number comes from. The KF5 packages alone are 40 MB. Then you need Boost, which is 230 MB including all dependencies I didn’t have already… the list goes on. Of course, you are free to use whatever dependencies you want to (it is your software, after all), but it feels overkill for what the GUI actually seems to be doing.

The reason why I think you’re doing regex parsing is that regex functions showed up really high when I profiled heaptrack_gui to figure out what was taking so much time. (This was during one of the really long loads; I don’t have that data around anymore.) Perhaps it’s an indirect call somehow?

KDiagram only requires Qt 5 and extra-cmake-modules. It does not require boost. Heaptrack itself does use some boost in a few places, outside of the GUI part that uses Qt 5.

Anyhow, if you have suggestions on how to improve the situation then I’m all ears. But currently, the file size of build dependencies is really of no concern to me.

Regarding regex hotspot: Please show me a callgraph that you got from your profiler. Also feel free to grep both heaptrack and kdiagram for regular expression classes (QRegExp, QRegularExpression) - they are not used. And I’ve profiled heaptrack_gui a lot in order to make it faster - never have I seen regular expressions pop up anywhere, let alone as a hotspot.

Wait, the default is with no optimization? You need to add an incantation (-DCMAKE_BUILD_TYPE=RelWithDebInfo) to get optimization, and it will hide the compile line from you by default unless you give it VERBOSE=1? No wonder there’s cmake hate in the comments…

Also, to offset some of the negativity in the comments beneath mine: thanks for making a great tool! Yeah, it’s a little tricky to build the GUI, but that should improve with time. I might just use massif until then, but I’d never have known about massif either if it weren’t for you.

Note that you can also generate flame graphs with heaptrack_print. Alternatively, you can push the heaptrack.FOO.PID.tgz files to another machine with a more modern Linux distributions where you have access to the required packages to build heaptrack_gui.

I don’t know how you managed to build this thing. You require .cmake files for QT5, but Debian does not provide them, not even with the qt5-dev package. (and not with cmake-data either). CMake is a terrible idea. It’s even worse than SCons. Thanks to you, I know never to use it in any of my projects.

You just had to use the absolute latest version of QT instead of just going with QT4. Now, instead of just being able to do “cmake”, I have to spend hours fighting with Aptitude to get QT5 to install without deleting half my system first. Thanks a lot.

Exactly, I had to use Qt 5 which is available since 3 years now already. I also had to use it because it’s more fun than Qt 4. And I do this in my free time after all. You are welcome, glad that you like it as much as I do!

And I never was able to build your project, because the file “FindQt5.cmake” apparently only exists on your hard drive and nowhere else, and your project can’t build without it. I came back here because now I’m trying to explain why CMake sucks balls on Reddit.

Solution for that was adding in {heaptrackRoot}/build/CMakeFiles/heaptrack_print.dir/link.txt to build component {boostRoot}/libs/iostreams/src/zlib.cpp and {boostRoot}/libs/iostreams/src/gzip.cpp`. Also adding linking zlib library via ‘-lz’ will be necessary.

I also had a lot of undefined references in boost libraries and zlib at the step where heaptrack_print is build. My problem was the ABI incompatibility between gcc 5.2 and clang 3.6 in Ubuntu 15.10, as clang was set as default compiler. With gcc as default, set with

This is odd, I explicitly require the boost iostreams component and link against it. And for me libboost_iostreams.so.1.57.0 links dynamically to libz.so.1 already. What distro do you use? What does ldd say for you?

Regarding your PS, no stack profiling is out-of-scope for heaptrack. I see no way to achieve this with the current approach. If you need that, use massif from the valgrind suite.

This is really cool Milian. I’ve been using gperftools for this work right now. Modified it do snapshots when resident size goes up by a specified amount and then take diffs of those - been using this on our TF2 dedicated servers. I will definitely keep an eye on your work with heaptrack though. Thanks much!
-Mike (Mike Sartain from Valve)

FYI: jemalloc has a built in statistical heap profiler that adds very little overhead and works well with programs with gigantic heaps. It generates pprof files so you use the same viewer tool as the gperftools. Its fast enough and low enough overhead that it can be turned on in production.

Walking the call stack to capture a backtrace is typically quite computationally intensive. Therefore it is infeasible to use precise leak checking for long-lived, heavily loaded applications. Statistical sampling of allocations makes it possible to keep the computational overhead low, yet get a general idea of how the application utilizes memory.

So they use sampling to speed up the process. heaptrack could do the same, to speed it up even further. But, imo, its overhead is so low, that we don’t need this. Getting the raw backtrace with libunwind is pretty fast. And since we do the DWARF annotation in a separate process, potentially at a different time, and also delay the actual interpretation of the data, the runtime overhead is small.

I’d be interested to see a comparison between such a sampling based method and heaptrack. Similar to perf or VTune, I assume that the sampling method will also find the hotspots. But it cannot give accurate heap memory measurements, nor hard numbers on the allocation calls or the like. Since with heaptrack we really get all data about heap allocations, we can do all sorts of analyses afterwards, and I’m not sure you can do all of that with the results one obtains from sampling.

W00t… I have long been googling “linux head profiler” to find a tool to investigate ever-growing memory usage in KDE components, but massiv was not able to give me the data I needed (or I was too dumb to interprete them).