Saturday, January 16, 2010

Profiling adventures [and cython]: basic profiling

In the previous blog post, I introduced a simple Tkinter-based viewer for the Mandelbrot set. As I mentioned at that time, the viewer was really too slow to be usable. In this post, I will start do some basic profiling and start looking for some strategies designed to make it faster.

The first rule for making an application faster is to do a proper profile rather than guessing. I make use of the profiler module, focusing on the main method (draw_fractal()) which I wish to make faster, and paying a closer look only at the most time-consuming functions/methods.

The profile run will call draw_fractal() once, when app is created, with the number of iterations for mendel() set at 20 (the default) and then call it again 10 times with a larger number of iterations. Running the profiler adds some overhead. Based on the previous run with no profiles, I would have expected a profiler run to take approximately 75 seconds: a little over 4 seconds for the initial set up and slightly more than 7 seconds for each of the subsequent runs. Instead, what I observe is that

Clearly, running the profiler adds some overhead. I should also add that there are variations from run to run done with the profiler, caused by background activities. As a consequence, I normally run the profiler 3 times and focus on the fastest of the three runs; however I will not bother to do this here: I simply want to start by establishing some rough baseline to identify the main contributors to the relative lack of speed of this program.

It appears clear that the largest contributor to the overall execution time is mandel(). Going down the lists of functions that contribute significantly to the overall time, I notice quite a few calls to Tkinter function/methods. So as to reduce the time to take a given profile, and to focus on mandel(), I will temporarily eliminate some Tkinter calls by changing draw_pixel() as follows.

def draw_pixel(self, x, y):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#self.canvas.create_line(x, y, x+1, y, fill="black")

Also, since I want to establish a rough baseline, I should probably see what happens when I increase the number of iterations from 100 to 1000 for mandel(), which is what I expect to have to use in many cases to get accurate pictures. I do this first using Python 2.5

Ouch! Close to 10 minutes of running time. However, it is clear that I have accomplished my goal of reducing the importance of Tkinter calls so that I can focus on my own code. Let's repeat this profiling test using Python 3.1.

We note that the total time taken is significantly less. Doing a comparison function by function, two significant differences appear: The built-in function abs is 7% slower with Python 3.1, which is a bit disappointing. On the other hand, range no longer appears as a function in Python 3.1; this appears to be the main contributor to the significant decrease in time when using Python 3.1 as compared with Python 2.5. This is easily understood: range in Python 3.1 does not create a list like it did in Python 2.x; it is rather like the old xrange. This suggest that I modify mandel1a.py to be as follows:

# mandel1b.py
import sys
if sys.version_info < (3,):
range = xrange
def mandel(c):
'''determines if a point is in the Mandelbrot set based on deciding if,
after 20 iterations, the absolute value of the resulting number is
greater or equal to 2.'''
z = 0
for iter in range(0, 20):
z = z**2 + c
if abs(z) >= 2:
return False
return abs(z) < 2

I also do a similar change to viewer1.py. From now on, except where otherwise noted, I will focus on using only Python 2.5. So, after doing this change, we can run the profiler one more time with the same number of iterations. The result is as follows:

This is now approximately the same as what we had for Python 3.1 as expected. Moving down on the list of time-consuming functions, we note that abs appears to be another function we should look at. Let's first reduce the number of iterations inside mandel to 100, so that a profiling run does not take as long but that proper attention is still focuses on mandel as well as abs. Here's the result from a typical run to use as a new baseline:

Since a fair bit of time is spent inside abs(), perhIaps could speed things up by using another method. The way that we approximate the Mandlebrot set is by iterating over a number of time and checking if the absolute value of the complex number is greater than 2; if it is, then it can be proven that subsequent iterations will yield larger and larger values which means that the number we are considering is not in the Mandelbrot set. Since taking an absolute value of a complex number involves taking a square root, perhaps we can speed things up by not taking the square root. Let's implement this and try it out.

# mandel1c.py
import sys
if sys.version_info < (3,):
range = xrange
def mandel(c, max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
z = 0
for iter in range(0, max_iterations):
z = z**2 + c
z_sq = z.real**2 + z.imag**2
if z_sq >= 4:
return False
return z_sq < 4

The result is worse than before, even though the total number of function calls has almost been cut in half! Actually, this should not come as a total surprise: abs is a Python built-in function, which has been already optimized in C. Extracting the real and imaginary parts explictly like we have done is bound to be a time-consuming operation when performed in pure Python as opposed to C. At this point, we might be tempted to convert complex numbers everywhere into pairs of real numbers so as to reduce the overhead of dealing with complex numbers ... but this would not have any significant effect on the overall time. [Those curious may want to try ... I've done it and it's not worth reporting in details.]
Clearly, I need a different strategy if we are to reduce significantly the execution time. It is time to introduce cython. However, this will have to wait until the next blog post!

3 comments:

I'm very interested in this series of posts, having done my own Python mandelbrot viewer (with the core computation in C, plus lots of algorithmic tweaking): http://nedbatchelder.com/code/aptus/

I'm looking forward to the Cython results!

I wonder why you are doing one final limit comparison even when you max out on iterations? In your last code example, line 17 doesn't need a comparison, it should just return True. (If you do keep the comparison, it should be against 4, not 2).

Ned: I make a reference to Aptus in the next post (currently in draft form). Regarding cython speed versus your code: it is not as fast, as far as I can tell. However, in Aptus you seemed to use some clever algorithm whereas I just does something very straigthforward.

As for the last line comparison: you are right; it is a typo. I still want to do one more comparison (instead of returning True) just in case ... This will be changed when I add a coloring scheme.