Sunday, February 18, 2007

Are bugs really important?

Reddit recently highlighted this post from an engineer who interned at Google and Microsoft before taking a job at Yahoo!. The stories are interesting, but this tidbit from an initial interview with Microsoft caught my eye:

I did one screening interview as a freshman on campus where I was rejected without mercy. Apparently the answer to “Can you tell me what was the most difficult bug you faced while programming and what you did to resolve it?” isn’t “My programs don’t have bugs.”

There's a huge body of lore around Microsoft interview questions. This is a common one, not used exclusively at Microsoft.

As I read that post, I started wondering what would my answer be? The longer I thought about it, the more surprised I was that nothing came to mind. This could mean a few things: that I'm not a good candidate for working at Microsoft, that I haven't been programming very long, or that I'm not a very good programmer.

Anyone who knows me knows that I would never want to work at Microsoft, so let's ignore that point. I've been slinging code for about 20 years, professionally for about 15 years, so my lack of a response doesn't necessarily imply a lack of experience. On most projects, I'm one of the developers that's brought in to solve the hard problems, so I don't think a lack of response is an indication that I'm just pulling a paycheck and not doing any real work.

The point of a question like this is to see how well and how deeply a candidate relates to his code. It's an obvious trap to elicit a response like "my programs don't have bugs", which is merely an indication that the interviewee (a) hasn't been programming very long and (b) hasn't worked on any project larger than a homework assignment. But if it elicits an animated and involved response from the interviewee, it shows he has a deep and detailed knowledge about his code and the platform he uses to develop software.

Unfortunately, this particular question places the focus on bugs. I wonder if this offers some insight into software development practices at Microsoft. Could it really be that their software development regimen is simply a never-ending bug squashing expedition into an ever-increasing blob of code? It's amusing to think so, but I doubt it.

The more I thought about it, the more I realized that I don't think about my career as moving from one bug to another. Over the years, I've picked up a variety of tools and techniques for isolating and fixing bugs. Some are mundane, like peppering print statements or invoking a source level debugger like perl -d or gdb. Some are more complex, like using test fixtures, constructing test data, or gathering diagnostics with a profiler. The skills are important; the bugs aren't.

But I occasionally do remember the bug. One that does come to mind involves a grammar I was writing from an incomplete and contradictory spec. The operator precedence was specified one way, but the expectation was the inverse. I implemented the precedence as specified, which did not result in the desired behavior. This led to many hours of meetings over a few weeks to determine that the spec was, in fact, wrong. The solution was to twiddle a couple lines of code, maybe 5 minutes to fix. Thirty minutes if you include the obligatory stop at the coffee shop to blow of some steam.

What matters for a professional software developer is preventing bugs from occurring in the first place, or quickly isolating bugs when they do occur. On that same project, I needed to test the grammar, so I wrote a test harness to run the parser capture its output, and compare the output to an expected value. I wrote another program to generate these inputs via brute force: running through all combinations of terms and operators, hunting for both successful parses and expected failures. In the end, I had a few thousand test cases that acted as a pretty complete set of regression tests.

Since I started working in Haskell, I find that I think even less about the bugs.

In many cases, a bug is a symptom that you're working on a large system, and the assumptions made in one part of the system violate the assumptions made in another part. A classic case is a memory leak, where the library assumes the caller manages a block of memory, but the caller fails to call free() when necessary. Or the opposite situation, where the library manages memory, and the caller calls free() needlessly.

However, in a Haskell program, these kinds of problems simply don't happen, or at least don't happen with nearly the same frequency. Each function acts as its own little world. The net result is that when debugging a particular function, you can ignore the rest of the universe and focus on one small piece of the puzzle in isolation.

Another advantage to Haskell is that many bugs don't happen because the type checker catches them. For example, if I want a single value out of a list, then I can't use map, which returns a list. Instead, I need to use a fold, or post-process the result of map with a function like head or last. In either case, the type checker will prevent me from running a program with this kind of bug.

All of which makes it easier to focus on the problems to be solved, and the tools needed to solve these problems, and ignore mundane details like bugs.

8 comments:

Burag
said...

I don't think that's the case. Caring about bugs are important. Not because one's career should be made of fixing bugs but because you learn from bugs. You get to see:- logic/implementation error- effect of a bug on a large system (don't screw up like him)- how to fix itEssentially by fixing one bug you have just added three lines to your dictionary of experience.

Also, I truly don't believe that the language you work in makes it less susceptible to bugs. It all has to do with the methodologies one uses and the care s/he puts into that piece of code.

You're making a pretty interesting statement about the low number of bugs you generate. Does this cost anything in terms of productivity? If the opposite (negative cost), why isn't Haskell (or similar programming languages) used more widely? Might there be scalability issues? Don't take these questions the wrong way; I'm working on the answers to questions like this myself. I'm just hoping you have thought some thoughts that I have not!

You're making a pretty interesting statement about the low number of bugs you generate.

I don't think so. I'm making a point that bugs aren't the milestones I think about. There are still plenty of bugs in my code, but once they're fixed, I move on.

If the opposite (negative cost), why isn't Haskell (or similar programming languages) used more widely?

In general, Haskell has a pretty steep learning curve. Figure that on average it takes about a 6-12 months working on a project to really understand the language and the type system. In my experience, if you have a background in functional programming, and you can really benefit from some features or libraries only available in Haskell, there may be a net positive benefit.

Haskell won't be used widely until (a) it's easier to learn the language, (b) there are more features only available in Haskell, or (c) both.

Also, I truly don't believe that the language you work in makes it less susceptible to bugs.

Sorry, but the language matters a lot. Java forces coercion to/from Object, and Ruby's duck typing leads to bugs that simply can't happen in a (decent) statically typed language. In both situations, you can find code trying to invoke undefined methods at runtime.

At the same time, Ruby and Java offer automatic memory management, so an entire class of errors endemic to C programs doesn't happen anymore in those two languages.

Considering Ruby doesn't really have the concept of compile time you can hardly complain that it doesn't catch bugs at compile time like staticly typed languages do.

Errors related to memory management are really rather easy to avoid in C and especially in C++. Most memory management errors come from not forcing your client code to manage the memory you are using. And even Java has the dreaded NullPointerException.

Catching bugs in languages like Ruby and Python simply requires are different metholody to statically preventing the misuse of code in C or Haskell. And even then, if you do not make use of statically typed languages abilities to detect errors at compile time (e.g. by using void*) you still have to do your error checking at run time. Writing your code to use static error checking can be tricky, and is simply another methodology.

The language matters in terms of what you have to do to debug the problems, but not so much in the quantity of bugs you encounter.

Adam, you said, "Another advantage to Haskell is that many bugs don't happen because the type checker catches them." You also note that the functional nature of Haskell isolates the code better than [most] stateful programming. I should think this would reduce the number of bugs, but since I have not written any Haskell myself (only read about it), I am asking you. Might there be some classes of bugs in Haskell that don't exist in non-functional languages? I know there's a problem when you discover you need state deep in a function and I know that figuring out type checking errors can be a pain.

I'll extend your note on bugs being invalid assumptions by proposing that all bugs are due to invalid assumptions. I've written several paragraphs on the issue; perhaps it might be of interest to you? I have been interested in how to write high quality code efficiently and have decided that a key piece of the puzzle is to explicitly state assumptions, preferably in ways the compiler can catch, then in exception throws right at the source of the problem (like throwing a NullArgumentException, where the culprit is just one stack frame away). Nobody has challenged me on this topic or been interested in discussing it; perhaps you would be?

Might there be some classes of bugs in Haskell that don't exist in non-functional languages?

Surely. Two that come to mind are time and space leaks.

Because pure functions are idempotent, the result of calculating a value for a function may or may not be preserved. There are cases where a function is called to perform the same (pure) calculation over and over again, even though the value cannot possibly change. This is a time leak -- a bug where you're spending too much CPU time doing the same thing over and over again.

A space leak is the opposite kind of problem. This occurs when the result of a function is preserved, even when it is no longer needed.

In both cases, the solution tends to be to rewrite code so that it doesn't trigger these pathological behaviors.

I know there's a problem when you discover you need state deep in a function and I know that figuring out type checking errors can be a pain.

Introducing state deep within a computation isn't generally a hard problem. Introducing I/O can be.

have decided that a key piece of the puzzle is to explicitly state assumptions, preferably in ways the compiler can catch

Haskell does just this, which filters out a vast number of bugs. Nicer still, using HM, it *infers* many of those assumptions without needing you to explicitly state them, saving considerable code clutter.

I'm still new to Haskell myself, but having programmed extensively in C/C++/Java/Pascal and Scheme, I can say that getting the code to compile and run is in comparison a royal pain, but a good one. It's like someone wrote an extensive unit test suite for you and the compiler won't let your code compile until it will pass the suite. Often then compiler will even infer logical assumptions you never even thought of, which is a very, very good thing. Once my code finally compiles, I am frequently amazed that it almost always does exactly what I planned the first time, bug free.

The downside are the time and space bugs. These exist in C++ and Java too, but much less frequently, and are an artifact of the very concepts that allow Haskell to make such wide reaching assumptions in the first place. On the up side, they don't occur too frequently, and they do not change the correctness of the program, just the speed of execution. I'd rather have the right answer take a pathologically long time to resolve then get back the wrong answer.

I look forward to being more proficient in Haskell, as I think the learning curve is very much worth the provable correctness of the code.