I have several classes that all inherit from a generic base class. The base class contains a collection of several objects of type T.

Each child class needs to be able to calculate interpolated values from the collection of objects, but since the child classes use different types, the calculation varies a tiny bit from class to class.

So far I have copy/pasted my code from class to class and made minor modifications to each. But now I am trying to remove the duplicated code and replace it with one generic interpolation method in my base class. However that is proving to be very difficult, and all the solutions I have thought of seem way too complex.

I am starting to think the DRY principle does not apply as much in this kind of situation, but that sounds like blasphemy. How much complexity is too much when trying to remove code duplication?

You're getting some good general answers. Editing to include an example function might help us determine if you're taking it too far in this particular instance.
–
Karl BielefeldtFeb 2 '12 at 22:03

This isn't really an answer, more of an observation: If you can't easily explain what a factored-out base class does, it might be best not to have one. Another way of looking at it is (I assume you are familiar with SOLID?) 'does any likely consumer of this functionality require Liskov substitution'? If there is no likely business case for a generalised consumer of interpolation functionality, a base class is of no value.
–
Tom WMar 29 '12 at 20:18

1

First thing is to collect the triplet X,Y,Z into a Position type, and add interpolation to that type as a member or maybe a static method: Position interpolate(Position other, ratio).
–
kevin clineSep 22 '14 at 19:13

9 Answers
9

In a way, you answered your own question with that remark in the last paragraph:

I am starting to think the DRY principle does not apply as much in
this kind of situation, but that sounds like blasphemy.

Whenever you find some practice not really practical for solving your problem, don't try to use that practice religiously (word blasphemy is kind of a warning for this). Most practices have their whens and whys and even if they cover 99% of all the possible cases, there's still that 1% where you may need a different approach.

Specifically, with regard to DRY , I also found that sometimes it is actually better to even have several pieces of duplicated but simple code than one giant monstrosity that makes you feel sick when you look at it.

That being said, existence of these edge cases should not be used as an excuse for sloppy copy&paste coding or complete lack of reusable modules. Simply, if you have no idea how to write a both generic and readable code for some problem in some language, then it's probably less bad to have some redundancy. Think of whoever has to maintain code. Would they more easily live with redundancy or obfuscation?

A more specific advice about your particular example. You said that these calculations were similar yet slightly different. You might want to try breaking up your calculation formula to smaller subformulas and then have all your slightly different calculations call these helper functions to do the sub calculations. You'd avoid the situation where every calculation depends on some over-generalized code and you'd still have some level of reuse.

Another point about similar yet slightly different is that even though they look similar in code, doesn't mean they have to be similar in "business". Depends on what it is of course, but sometimes it's a good idea to keep things separate because even though they look the same, they might be based on different business decisions/requirements. So, you might want to look at them as vastly different calculations even though they code wise might look similar. (Not a rule or anything, but just something to keep in mind when deciding if things should be combined or refactored :)
–
SvishFeb 3 '12 at 11:20

@Svish Interesting point. Never thought about it that way.
–
PhilFeb 3 '12 at 15:42

I believe that almost all repetition of more than a couple lines of code can be factored out in some way or another, and almost always should be.

However, this refactoring is easier in some languages than others. It is quite easy in languages like LISP, Ruby, Python, Groovy, Javascript, Lua, etc. Usually not too difficult in C++ using templates. More painful in C, where the only tool may be preprocessor macros. Often painful in Java, and sometimes simply impossible, e.g. trying to write generic code to handle multiple built-in numeric types.

In more expressive languages, there is no question: refactor anything more than a couple of lines of code. With less expressive languages you have to balance the pain of the refactoring against the length and stability of the repeated code. If the repeated code is long, or liable to change frequently, I tend to refactor even if the resulting code is somewhat difficult to read.

I will accept repeated code only if it is short, stable, and refactoring is just too ugly. Basically I factor out almost all duplication unless I am writing Java.

It's impossible to give a specific recommendation for your case because you haven't posted the code, or even indicated which language you are using.

When you say the base class has to perform an algorithm, but the algorithm varies for each sub class, this sounds like a perfect candidate for The Template Pattern.

With this, the base class performs the algorithm, and when it comes the variation for each sub class, it defers to an abstract method which it is the responsibility of the sub class to implement. Think of the way and ASP.NET page defers to your code to implement Page_Load for example.

In my opinion, you are correct, in a sense, that DRY can be taken too far. If two similar pieces of code are likely to evolve in very different directions then you can cause yourself problems by trying not to repeat yourself initially.

However, you are also quite right to be wary of such blasphemous thoughts. Try very hard to think through your options before you decide to leave it alone.

DRY is a guideline to follow, not an unbreakable rule. At some point you need to decide that its not worth having X levels of inheritance and Y templates in every class you are using just to say there is no repetition in this code. A couple good questions to ask would be will it take me longer to extract these similar methods and implement them as one, then it would to search through all of them should the need to change arise or is their potential for a change to occur that would undo my work extracting these methods in the first place? Am I at the point where additional abstraction is starting to make understand where or what this code does is a challenge?

If you can answer yes to either of those questions then you have a strong case for leaving potentially duplicated code

You've gotta ask yourself the question, "why should I refactor it"? In your case when you have "similar but different" code if you make a change to one algorithm you need to make sure that you also reflect that change in the other spots. This is normally a recipe for disaster, invariably someone else will miss one spot and introduce another bug.

In this case, refactoring the algorithms into one giant one will make it too complicated, making it too difficult for future maintenance. So if you can't factor out the common stuff sensibly, a simple:

In deciding whether it's better to have one larger method or two smaller methods with overlapping functionality, the first $50,000 question is whether the overlapping portion of the behavior might change, and whether any change should be applied to the smaller methods equally. If the answer to the first question is yes but the answer to the second question is no, then the methods should remain separate. If the answer to both questions is yes, then something must be done to ensure that every version of the code remain in sync; in many cases, the easiest way to do that is to only have one version.

There are a few places where Microsoft seems to have gone against DRY principles. For example, Microsoft has explicitly discouraged having methods accept a parameter which would indicate whether a failure should throw an exception. While it's true that a "failures throw exceptions" parameter is ugly in a method's "general usage" API, such parameters can be very helpful in cases where a Try/Do method needs to be composed of other Try/Do methods. If an outer method is supposed to throw an exception when a failure occurs, then any inner method call which fails should throw an exception which the outer method can let propagate. If the outer method isn't supposed to throw an exception, then the inner one isn't either. If a parameter is used to distinguish between try/do, then the outer method can pass it to the inner method. Otherwise, it will be necessary for the outer method to call "try" methods when it's supposed to behave as "try", and "do" methods when it's supposed to behave as "do".