Thursday, June 18, 2009

Not all Code is Created Equally (Nor should it be)

Recently I built a stand-alone component that extends the ASP.NET FileUpload control to save to the Amazon S3 web service using method calls as simple as the FileUpload’s method SaveAs(). As I was writing the code for this component, I slipped into a different mode of development where I was thinking about all the extremely rare, 1 in 100,000 weird edge cases that people using this component might run into. The primary reason for this was that this is something that was intended on being released into the wild and I just wanted to release it and forget it. At this point, you may be saying, hmmm…this guy is a hack if he doesn’t always consider the “weird edge cases” that might occur. I beg to differ, writing really solid and truly 100% industrial strength code is extremely difficult and time consuming. To get to a point where your code is 99.99% perfect probably takes a couple of orders of magnitude more time and effort than to get your code to 99% perfection. The final usage and intended audience for the S3FileUpload component is completely different then the majority of the code we write for our business apps.

In most business applications the level of our software architectures and canned components should be to a point where our role is to write some sort of interface (UI or Service) to populate our business objects, validate the data is correct in that business object, possibly perform some business logic on that data and persist it to the database. Of course there are all sorts of new fangled patterns that can be followed to make this happen, but in the core essence that’s for the most part what they are doing.

Once we accept the majority of the code we write is never going to be used in navigational systems of the space shuttle, or even the base class library for the .NET framework, we can get on with business of establishing the right level of software quality for the problem we are attempting to solve. If you’re are thinking at this point, ok good, the way I’m putting all my business logic in the UI is an acceptable level of quality, this article assumes a baseline of experience, knowledge and quality practices so come back after you have a few more years and/or systems under your belt.

There are different concepts that need to be evaluated when choosing the “level of software quality”

What happens when your bugs sneak past QA?

Are you building some data entry screens that get used by two people once a quarter to update some values? If this works correctly 95% of the time, you may have a failure once every few years or so.

How easy is it for your users to know that there is a failure? Does it crash? Does it save the wrong value?

What happens when a wrong value gets saved? Do you just need to renter the value and all is well?

Do you have to run a simple process that reprocesses data that takes two minutes? Does it take two hours?

Does this wrong value impact the pay of one or two employees? Maybe it impacts one or two thousand? How much work is it to issue new pay checks?

Is this wrong value impacting the commission charged on selling securities where one or two days using the wrong value costs the company millions of dollars?

Obviously making sure that a single value is entered, validated and saved correctly sounds like a simple thing right? You can always get that one right.

How much work is it to craft a 100% perfect solution?

For that value that is entered by two people once a quarter, do you just have a target range that you need to validate against?

Do you need to download values from a government site, make sure that happens properly and use those values to validate?

Do you need to download values, apply some corporate wiz-bang business logic and then do validation?

Do you need to download values, apply biz-logic and get sign-off from the CFO?

Do you need to download values, apply biz-logic and get sign-off from the CFO and then perform some workflow with the board of directors before this value becomes valid?

As you can probably gather, if the impact of the failure is minimal, but the effort and it’s corresponding cost of establishing a bullet proof software is not minimal, the decision on the investment in time (and money) is not an easy one. It’s in our nature as software developers to try to achieve perfection, well at least it should be. But in some cases the cost of perfection just isn’t justified, I’m not sure there is a empirical formula here but I think there are things that can be done.

Some strategies for dealing with being lazy (or using your time wisely)

If you have a spec you are working from, not only understand the words that make up that spec, but understand the spirit in which that spec was created. That is try to read between the lines and understand what the person really wants to get out of the software and what it needs to do. Writing a spec is not easy, even for us analytical types (programmers) for the creative types this is even more difficult.

Discuss with the business stake holders the importance of the features. Don’t get too technical but try to explain your perceived efforts and get an understanding of the impacts of failure.

If you decide that the failure isn’t critical and time might be spent doing other work or adding other features, what ever you do don’t fail silently. Fail quickly and fail in an obvious way. What ever you do don’t do something like: try {…} catch {};

Make sure you are using a good error logging format. If you do find yourself in an unexpected state that would cause a failure, capture that state, capture the stack trace and have the logging system shoot yourself an email. Most of the times it’s easier to fix the problem than originally find it.

Write the right unit tests, I’ll have to admit, I’m not a big test first or 100% code coverage fan, but a number of well place unit tests have saved my a** more than a few times. What is the right number, sorry I don’t have that answer, I think it is a formula of the sophistication of the software and the cost of the failures.

Leverage your piers, if you are working on something where you know you should spend considerably more time on, but it just doesn’t make sense for you to churn on something for a week or two. Bring in a trusted pier, do a code or concept review, you may be too close and miss something obvious.

Just to be clear, I’m not advocating being sloppy and not thinking through the edge cases of the software you are writing, but there is a cost in the pursuit of perfection. You may win the battle (a perfectly tested and functioning admin module, which I will argue is not perfect anyway) but lose the war, never ship the system that uses the data from that admin module.