Difficult Conversations

Sunday, 14 October 2007

I have been working on some code that was given to me to optimise. The problem was that with a few cases it took about 30 seconds but with 20 times that it was taking 80 minutes and using up all available memory. I hadn't worked on the code before and it was basically a Python script with one big loop of about 400 lines with 2 more nested loops. The tabs were set to 2 spaces which was difficult to read. There were two little helper functions; everything else was global. There were no functions worth running the profiler against. The heritage language before Python was a Fortran derived language.

So I did what I usually do with code that isn't mine. I started cleaning it up, extracting functions, getting rid of globals, renaming variables. The basic extractions I performed were:

Everything from the global namespace into a main function.

Contents of each loop into its own function.

Any branch with more than a few statements in it into its own function

This broke the code up into manageable chunks. I was able to spot some unnoticed duplication and remove it. I got a grasp on the meaning of the chunks and made them more Pythonic. I started feeling I could breath a little more.

I found the calls that were being made multiple times in the innermost loop and cached them so they were made once. This fixed it so that it took about 40 seconds. Not bad. So I passed the code along to my colleague.

The thing is though, I felt really dissatisfied with the code. I'd extracted functions and simply passed them the parameters they needed. The functions that had been in the innermost loops took a large amount of parameters, and they were carried through from all the outer loops. Many of the functions took more than about 8 parameters. It was easy enough extracting them because I was just making it work. Reading the code afterwards was convincing because I would read the function name and the parameters it used to get a sense of what the function was doing and the dependencies were explicit, that's a good thing, right? The thing that made my heart sink was the thought of my colleague having to use these functions with 8 or 10 parameters. It just wasn't right.

I realised that I'd just done divide and conquer but without stepping up to abstraction.

So I started looking at the data that was being passed into functions. Some was constant over the loop, so instead of passing it all to the function I made a class containing it and passed that in instead. So each function took a 'static' parameter and variable parameters for the loop.

Then I looked in the function and found what the 'loop static' parameters were being used for. If they were used for a branching decision then I made that decision in the constructor of the class instead and created a function object to perform the correct branch operation. The function object was passed any extra needed parameters (usually the loop varying variables).

The effect of applying these two rules was to simplify the logic of lower functions by removing branches that didn't need to be evaluated over and over and remove the unneeded parameters. It was far more readable.

I also started to replace some of the conditionals that used the looping variables. The looping variable became another class which was initialised with a function object. I didn't take this all the way through the code, but if I had I would have moved all the conditionals to the outer loops where the code talked to the system. This is the effect talked about by Kevin Rutherford and, as Michael Feathers comments, 'there is something sublime' about it.