Hunting for Memory Leaks in Python applications

We use Python a fair bit at Zendesk for building machine learning (ML) products. One of the common performance issues we encountered with machine learning applications is memory leaks and spikes. The Python code is usually executed within containers via distributed processing frameworks such as Hadoop, Spark and AWS Batch. Each container is allocated a fixed amount of memory. Once the code execution exceeds the specified memory limit, the container will terminate due to out of memory errors.

A quick fix is to increase the memory allocation. However this can result in wastage in resources and affect the stability of the products due to unpredictable memory spikes. The causes of memory leaks can include:

The option include-children will include the memory usage of any child processes spawned via the parent process. Graph A shows an iterative model training process which causes the memory to increase in cycles as batches of training data being processed. The objects are released once garbage collection kicks in.

If the memory usage is constantly growing, there is a potential issue of memory leaks. Here’s a dummy sample script to illustrate this.

B. Memory footprints increasing across time

A debugger breakpoint can be set once memory usage exceeds certain threshold using the option pdb-mmemwhich is handy for troubleshooting.

Memory Dump at a Point in Time

It is important to understand the expected number of large objects in the program and whether they should be duplicated and/or transformed into different formats.

To further analyse the objects in memory, a heap dump can be created during certain lines of the code in the program with muppy.

# Get references to certain types of objects such as dataframedataframes = [ao forao in all_objects ifisinstance(ao, pd.DataFrame)]

ford in dataframes: print d.columns.values print len(d)

Example of summary of memory heap dump

Another useful memory profiling library is objgraph which can generate object graphs to inspect the lineage of objects.

Useful Pointers

Strive for quick feedback loop

A useful approach is creating a small “test case” which runs only the memory leakage code in question. Consider using a subset of the randomly sampled data if the complete input data is lengthy to run.

Run memory intensive tasks in separate process

Python does not necessarily release memory immediately back to the operating system. To ensure memory is released after a piece of code has executed, it needs to run in a separate process. This page provides more details on Python garbage collection.

Debugger can add references to objects

If a breakpoint debugger such as pdb is used, any objects created and referenced manually from the debugger will remain in the memory profile. This can create a false sense of memory leaks where objects are not released in a timely manner.