Programming the last mile

In any programming project there comes a point where the programming ends and manual processes begin. That boundary is where problems occur, particularly for reproducibility.

Before you can build a software project, there are always things you need to know in addition to having all the source code. And usually at least one of those things isn’t documented. Statistical analyses are perhaps worse. Software projects typically yield their secrets after a moderate amount of trial and error; statistical analyses may remain inscrutable forever.

The solution to reproducibility problems is to automate more of the manual steps. It is becoming more common for programmers to realize the need for one-click builds. (See Pragmatic Project Automation for a good discussion of why and how to do this. Here’s a one-page summary of the book.) Progress is slower on the statistical side, but a few people have discovered the need for reproducible analysis.

It’s all a question of how much of a problem should be solved with code. Programming has to stop at some point, but we often stop too soon. We stop when it’s easier to do the remaining steps by hand, but we’re often short-sighted in our idea of “easier”. We mean easier for me to do by hand this time. We don’t think about someone else needing to do the task, or the need for someone (maybe ourselves) to do the task repeatedly. And we don’t think of the possible debugging/reverse-engineering effort in the future.

I’ve tried to come up with a name for the discipline of including more work in the programming portion of problem solving. “Extreme programming” has already been used for something else. Maybe “turnkey programming” would do; it doesn’t have much of a ring to it, but it sorta captures the idea.

5 thoughts on “Programming the last mile”

In the 1990’s, some people promoted “literate programming,” which meant writing the code and the documentation simultaneously.

I assume you are talking about a “real” project that includes documentation, delivery, testing, and so on. For me, I find that these projects don’t develop linearly. As I’m working I think, “I’ve solved this step before” and then go grab code from some “throwaway” analysis which was not documented or tested. Usually I’ve solved a problem three times before it makes it’s way into it’s final shape and delivered to a customer. (Seehttp://blogs.sas.com/iml/index.php?/archives/36-Tricks-and-Treats.html)

That suggests the question, “When should this process begin?” If I write a small program for a blog or a “back of the envelope” computation, I don’t expend the same energy to document and test it as for a “real” project. And yet, those small programs often contain the germ of an idea or technique that does eventually become production software.

I think my style might be called “iterative programming.” The reproducibility doesn’t become an issue occur until the 3rd or 4th iteration.