notes.public

It’s a common pattern in object-oriented languages to use subclassing, inheritance and method overriding to define and implement interfaces. For example, one of my old projects, EasyCapViewer, gradually grew five or six drivers for different version of the video capture hardware it supported. A similar project, Macam, had hundreds of drivers for almost every possible USB webcam. Both of these programs defined an semi-abstract base class with methods intended to be overridden by concrete driver implementations, and also convenience implementations that drivers could call back up to for shared behavior.

While working on EasyCapViewer, I learned what a bad idea that design is. The problem is that you are mixing the interface with a shoddy differential compression scheme. The more implementations you have, the more redundancy you will have, and the more that redundancy hurts when you find a bug or have to change something. As you add back-ends, you frequently discover code that is common amongst two or more of them. The obvious thing to do is push that code up into the common base class. That might mean adding a new method, or changing the behavior of an existing one.

However, the biggest piece of redundancy should be the interface itself, which every back-end is tied to (somehow) because every back-end needs to implement it. It can’t be factored out, and every time it changes, most or all of the implementations need to be updated. So it’s crucial that this interface be ironed out and hammered down as early as possible.

So there is a fundamental conflict between trying to optimize the interface to “compress” the implementations, because the appropriate compression depends on the implementations themselves, and they will change and grow over time. On the other hand the interface must not change, because that requires changing all of the implementations. (In lower circles of hell, changes to the interface will cause changes to the appropriate compressions, and vice versa. You can get caught in a loop until things finally stabilize at a new local optimum.)

This is, in a word, bad.

What I recommend, and what I did in my more recent project libkvstore (which supports “drivers” for several different storage engines), is to separate the interface from the code deduplication system. The interface is defined mostly up front, based on the nature of the thing being abstracted and what it needs to do. (This is extremely difficult and vastly under-appreciated, and it’s a topic I’m still not qualified to write much about.)

Then, as implementations are added and redundancy is noticed, “implementation helper” functions are added. These helpers are not publicly exposed, and might not have any formal basis or rationale for existing. They simply soak up whatever duplication is found. As new implementations are added and the apparent duplications change, new helpers can be added that are subsets or supersets of old helpers (and helpers might call each other behind the scenes). Helper functions are a lot like a compression dictionary. They just represent duplication, not necessarily any real structure or meaning.

There is one class of helper that I’m still on the fence about. In a language like C with textual (rather than semantic) macros, you can abstract out even the duplication of function declarations. That lets you standardize argument names and even make certain changes to function signatures easily. On the other hand, it makes the code ugly, harder to read, and harder for some tools (like syntax highlighters) to parse. It’s easy to dismiss this idea out of hand, but when you have hundreds of back-ends, every little bit of deduplication might be worth it. That said, automated refactoring tools can help with these problems in a way that might be more socially acceptable. (I optimistically used this technique for file type converters in StrongLink, even though it currently only has two.)

You might have noticed that I’ve recommended against using most of the standard features of object orientation here. It turns out that inheritance and overriding conflate different things in a way that causes problems. In particular, avoid about using differential compression to guide your interface designs. That said, they are appropriate tools in some cases, especially when you have a strong theoretical basis for what each class’s responsibilities and inheritance relationships are.