Sunday, February 28, 2010

Here is a great technique to adopt when you must initialize a variable to one of two or more values. It's a simple technique that's easy to apply even when you're feeling lazy, but I'm surprised at how few people are aware of the advantages.

The scenario is simple enough: while revisiting a little parser I wrote a few months ago which turns strings such as "2010-02.1m" into corresponding typesafe immutable objects with a start date and end date, a small addition I made to it had the fortunate side effect of revealing a boneheaded bug I had baked into the original version.

Quick–can you spot the bug in this code, which compiles with no warnings and no errors?

It's missing the else part that throws an exception for unknown period types. Did you see it, or were you too distracted by the calendar manipulation to notice the obvious problem?

This code fragment comes from an accounting-related application, and sooner or later someone is going to have to add "q" for Quarter. If you gave the above code a periodType of "q," it would happily return an incorrect result rather than throw an exception. Besides, we're essentially building a parser for a little language here. It ought to throw an informative exception when you give it invalid input.

It was easy to justify the need for the catch-all else clause here, but truthfully, it's simply part of the idiom, and it's a waste of time to contemplate whether the catch-all is likely to actually do anything in practice. You should just get in the habit of always including it, and consider its omission in existing code to be an error. Unfortunately, even though it's such an important thing to remember, the compiler didn't issue any warnings or errors about its absence.

The good news is, your Java compiler can enforce the need for an else block that throws an exception (or returns normally, but your muscle memory for this idiom should have you typing "throw new …" before you've even thought about it.)

By the way, even if you did "immediately" spot the missing else, I bet it still took you more than .0001 seconds. It would have been more efficient, and safer, to let your Java compiler pick it up.

Let the compiler help you

So how do you enlist javac's help? Here's the change I made that gave javac the chance to throw my previous mistake in my face (sure, it sounds irritating, but it's actually a very very good thing:)

Now, the code won't compile: referencing monthsCovered is now an error because it might not be initialized. If I had initialized monthsCovered to 0 (or any other value) then the compiler would not have found the missing else for me. This is the key of what I'm saying: don't initialize your local variables to some garbage value. Leave them blank, and the compiler will help you to write better code.

But you can do even better

In most cases such as this, you can gain even more safety by marking the uninitialized variable final. This causes the compiler to guarantee that the variable has been assigned exactly once before it is first referenced.

Marking your uninitialized local variables final makes it virtually impossible for a future maintainer[1] to accidentally introduce subtle bugs into your multi-path initialization code–and this is the code where bugs experience long incubation periods before they rear their ugly heads[2].

This third revision is robust code that's relatively safe in the hands of future maintainers. In terms of word count, it's a small change from the first example, but it's code I'd be far more comfortable having inside any system I'm responsible for.

The next step

Is it possible to make this type of code even more robust in the face of future modification? Yes! The typesafe enum pattern, introduced by Josh Bloch, is undoubtedly safer than what I've recommended above. It has even become a language feature in Java 5, which eliminates much of the boilerplate in simple applications of the pattern. However, unlike leaving your variables uninitialized and marking them final, applying the typesafe enum pattern to this type of problem is a big tradeoff in terms of lines of code you will write to solve the same problem. Is the tradeoff worthwhile? In many cases, I think it is.

My next post will go through the process of applying a typesafe enum to this same problem, and I will give you my insights on when I would go the extra mile versus sticking with the more succinct solution you see above. Ultimately, though, it's up to you to decide for yourself on a case-by-case basis.

[1] And let's be honest–this hypothetical future maintainer will be you, six weeks from now.[2] This will happen either in production, or, more likely, during a demo.