September 11, 2007

Benchmarking Parallel Python

This post is Bruce Eckel’s follow-up to his previous post which covered, among other things, concurrency within Python. Basically, CPython has the Global Interpreter Lock (GIL) which makes life very awkward for those wanting to run Python on more than one processor.

Anyhow, in this post Bruce points to Parallel Python as an add-on module which is a potential solution. I had a look at this and thought it was pretty cool. However, bearing in mind Guido van Rossum’s post about the performance implications of removing the GIL last time it was attempted I thought I’d see if this actually did provide a speed-up and benchmark it.

The following stats are for calculating the sum of primes below every multiple of 10000 between 105 and 106 (including the lower bound and excluding the upper). The first set uses only one working thread0 of my Core Duo laptop and the second set uses two (as I have two processors).

It should be noted that the code snippet being used is provided as an example on the Parallel Python website and so is probably one of their most optimal cases. Regardless, I think the numbers are helpful.

Two Processors

It can be seen that running two worker threads increases the actual CPU time used by around 30 seconds but the fact that two processors are being used leads to a total speed up factor of 1.918709304, which is pretty impressive.

—

0 I’m not sure of the internals, so I don’t know if it is technically a thread. Regardless, only one calculation will happen at a time.

3 comments by 1 or more people

Dror Levin

Parallel Python uses processes and IPC, not threads, precisely because of the GIL.
This is nothing new, there isn’t even a comparison to threads which will show the speed remains about the same.

11 Sep 2007, 17:36

I wasn’t really intending to compare this to threads or other ways of parallelising in Python. I just wanted to look at the numbers for this way of doing it (as Bruce Eckel seemed interested) and thought that other people might be interested in seeing those numbers as well.

Another thing to note is that Parallel Python has support for using other machines as well as locally. Obviously this isn’t tested above, but this is a reason to use PP rather than threads if you might want to expand to a number of machines…

Trackbacks

Having had it pointed out to me that my last benchmarking post is fairly useless without a comparison to threading by a couple of people, I now have such a comparison. The numbers for PP are those used in the last blog post. For threads I initially tried &hellip;