When we first announced the series on the engineering philosophy behind CULA, we said that code is at the heart of a software product, but that there many other factors to consider when creating software. Since we've covered many of these other factors since then, it's time to actually talk about some code related issues. For the next few posts, we'll be talking specifically about the programming practices we follow when writing code for CULA.

One of the most important techniques we use is called Generic Programming. Generic programming is a facility provided by some programming languages that allows you to avoid duplicating code. For those of your familiar with C++, you might know this feature as 'templates'. Rather than talking about it further, I think an example would show this concept most clearly.

Without generic programming techniques, you might find code that looks like this:

Which group of code do you prefer? Each function is virtually identical, with the only exception that the data type is different. Although these contrived examples are short, you can imagine just much code might be duplicated for routines that stretch into thousands of lines.

Why is duplicate code such a problem? In short, when writing code, any time you want to make a change to a function you have to make these changes for as many different variants of a function as you have. If you have four variants, that is four variants to edit. That's four times as much work and four opportunities to introduce errors. I can't count the number of times I've made a mistake while these kinds of edits. Worst of all, errors introduced in this way can be especially hard to debug because they could introduce small and hard-to-find differences in behavior between the variants. (Hopefully you've implemented a good revision control strategy to help you if you find yourself in this situation ... maybe we'll talk revision control in a later post).

While the functions above are completely contrived, my example of four different variants isn't just made up; it's the actual complexity of a package like CULA. For those of you familiar with CULA's data types, there are four to consider: single, double, single-complex, and double-complex. That's four different variants of functions to handle (e.g. SGETRF, DGETRF, CGETRF, ZGETRF). While there are in fact differences between these routines, they are typically very few compared to the length of the code. For example, here is a visual diff between two variants of the GETRF function in the Netlib LAPACK package. Although these files are actually 5000 characters each, they only differ by 50 characters, or 1%.

You'd be surprised with how many people in industry still program like this. Many of them do so because they first learned to program in a language without generic programming support and have never learned what generic programming is or how to use it. Others have an awareness of generic programming but have misconceptions as to its utility or its effects on your compiled code. Or have you ever looked through some code to see two separate linked list implementations to hold pointers to two different structs?

When designing CULA, we use generic programming to its full potential. The cool thing about mathy code in different precisions is that there are rarely differences between single/double cases, and even the real/complex differeces are sometimes small or absent. We even write our CUDA kernels this way Although at the surface it appears as if we implement different routines (culaSgetrf, culaDgetrf, culaCgetrf, culaZgetrf), each of our routines is actually designed in a generic way under the hood. This allows us to better manage our code and also has the benefit of allowing us to rapidly implement each variant in parallel.