Multiprocessing and exceptions – some batteries not included

Today I’m going to write about a not that minor inconvenience one faces when using the built-in multiprocessing module – how child process exceptions are presented to the user. I will show you also how to improve it, so in case something goes wrong you don’t have to guess where the problem is.

Standalone multiprocessing

Through this story, we will stick to a very simple calculation shown below. We have our computation code contained in the ‘go’ function and want to apply it to a range of parameters. We decided to make use of facilities provided by the multiprocessing module. Unfortunately, during a long and tiring coding sprint, a bug crept into our code:

From such traceback we can find out what was the type of exception and what was the target function of the Pool.map call. In case of our ‘go’ function guessing where is the problem is fairly simple – the target function is short, with only a single place where this exception may be coming. In real life the target function will be usually far more complicated and may call other functions from external modules. So seeing traceback similar to the one above doesn’t help at all. Is it our code that thrown the exception? numpy? scikit-learn? Happy guessing – lack of information which line in our code caused it makes our life miserable. At this point we have two possibilities – launch a proper python debugger or try to obtain traceback as it would be presented to us if the code would be run in the non-multiprocessing way.

Since traceback is often enough to understand what is the problem, this time we will leave the debugger at rest and try to obtain a more informative printout.

The traceback module

In order to improve our situation we will use the traceback module to, ehm… obtain a traceback. In order to have our solution reusable we will put it into a decorator:

(this is actually repeated couple of times since our exception is thrown inside more than one process). In the above output you can exactly see where (which line number) is the problem coming from.

It is worth noting that the usage of functools.wraps helper decorator is crucial in our case – without this the __name__ attribute of the decorated function gets lost (i.e. set to ‘wrapper’) which then makes pickle module fail. The later one is used by the multiprocessing module to serialize function executed inside child processes. You can verify this by getting rid of functools and then setting the __name__ of resulting decorated function manually.

So at this point we are able to get a proper traceback which could be enough. But there is also a different possibility I would like to explore.

The fun way

Some while ago I have discovered a little gem – the joblib package. In order to get it, you need to run ‘pip install joblib’ inside your virtualenv. Among others, it offers an alternative to the multiprocessing module when doing parallel computation similar to ours. With joblib, we can rewrite our code in the following way:

As you can see, we got a code listing with the line causing the exception marked. Below that line, you can also see information on local variables at the point exception was thrown. You may also notice that arguments with which the ‘go’ function was called are also print. So tons of useful information that in lots of cases will allow us to immediately understand the problem. Neat!

Wrap up

We have seen, that in normal conditions the multiprocessing module won’t give us the usual amount of information on an exception beeing thrown inside the child process. This is slightly surprising, as one could expect that (following the “batteries included” philosophy) this should be done in the exactly same way as when no multiprocessing module is used. In order to get this info you should use the traceback module. Or, in some cases, go for joblib. Note, that it offers far more than nice printouts in case of problems.

Privacy Settings

This site uses functional cookies and external scripts to improve your experience. Which cookies and scripts are used and how they impact your visit is specified on the left. You may change your settings at any time. Your choices will not impact your visit. Please visit Privacy Policy for more details.

NOTE: These settings will only apply to the browser and device you are currently using.

Google Analytics

pragmaticpython.com uses google analytics in order to understand website usage and traffic. Please read information provided at <a href="http://www.google.com/policies/privacy/partners/">www.google.com/policies/privacy/partners/</a> in order to understand how Google uses data when you visit pragmatic python