Wednesday, August 22, 2007

A note from Housekeeping: Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) and I will be leaving shortly for a three week trip to Japan, where internet access (for us, anyway) will be limited. The fannish among you will correctly guess that we will be attending Nippon 2007, the 65th World Science Fiction Convention, held this year in Yokohama. We will also be spending time in Tokyo, Hiroshima, Kyoto and Osaka. Perhaps we will see you there!

It has been said that there are only seven basic plots from which all stories derive. Textbooks define seven simple machines on which all mechanical devices depend. These numbers are in dispute, of course, just as the number of fundamental particles has grown from the neutron, electron, and proton that you may have been taught in grade school. When I sat around the campfire with the tribe as a child, it was pretty much just Earth, Air, Fire and Water.

You may be surprised to find that in 1936 mathematician Alan Turing proved that all computing devices can be defined in terms of a few simple operations which he defined to be a Universal Turing Machine. For sure, the UTM is an abstract concept; programming in assembler (any assembler) would be a tremendous step up. But the basic idea is that all computing is at its heart built on a foundation of just a few very simple operations, which are combined macro-like to define more complex operations, which are in turn combined to define yet more complex operations, pretty much ad infinitum. It's macros all the way up until you finally arrive at languages like Ruby, Prolog, and Haskell.

Barry Karafin, computer scientist, professor, one-time CEO, and former head of Bell Laboratories, once said: "The half-life of any high technology is around five years". That's not true for all technologies, of course. As much as we hate to admit it, FORTRAN and COBOL have actually done pretty well for themselves and are still in use today. C has done remarkably better, serving as a portable assembler language in most of its applications. The jury is still out on C++ and Java, although I have more faith in the latter than in the former. It's not clear when or even if the Internet Protocol will be ever be replaced, but probably not in my lifetime.

While it may be fun to learn the latest hot new language (or framework, or any other technology), and hence feel hip for at least a brief while, to me that hot new language just doesn't really feel all that new. It seems more like one of those movies that try to recapture the feeling of a television show from thirty or so years ago, and frequently doing a terrible job of it. Even if a new technology is successful, you usually get the feeling that it won't be successful for all that long before it's replaced by the next big thing. The success of C, Java, and TCP/IP is that they've stuck around long enough to make becoming an expert in them worth your time.

I don't know how many basic algorithms or design patterns there are, but it's a lot more than seven. But I'm pretty sure it's not infinite either. And I know that those algorithms and patterns aren't language-specific. Sure, a pattern might be easier to implement in one language than in another, speaking as someone that has done object-oriented design by implementing the C++ virtual table in assembler. But as long as your language is Turing-equivalent, I'm positive you can implement any algorithm or design pattern in it.

That's why I find it a lot more satisfying to learn algorithms, design patterns, and basic techniques, than I do to learn a new programming language. And that's why for many high technologies, I practice Just In Time Learning. I deliberately delay in learning most new technologies until it becomes apparent that I'm actually going to need to know it. Then I ramp up quickly.

For sure, there are lots of things I've learn just for the sheer joy of it. Digital Aggregates has an Asterisk server because I got interested in the idea of an open source PBX. I have a Nokia N800 on my desk because I'm intrigued with the idea of an wireless internet appliance. I installed the CVM on it because I'm interested in using embedded Java in such devices. But I'm very careful where I spend my time in learning new things. This is partly because I am old and don't have much time left (see One Hundred Books), and partly because the value equation inside my head tells me it's just not worth the time I do have when compared to other things I could be spending my time learning. I prefer to learn high-leverage stuff.

Does this mean that I am arguing against the life-long learning that most career advisers say is absolutely necessary? Just the opposite. To practice JITL, you have to be really good at coming up to speed quickly. To do that, you must learn how to learn. And the only way to do that is by practicing learning new things. But you have to be picky about where you spend your time, because even though you are probably a lot younger than me, it is likely that your time is limited as well. (Whether the time of your children is similarly limited is a discussion we'll leave to the Singularity folks.)

Does it mean that I refuse to learn niche technologies that I need to do my current project? No, indeed. But it does mean that learning those niche technologies is made easier by that fact that I can see in them patterns and techniques in common with stuff I already know.

Fortunately, my career has been one big rummage sale of projects whose learning requirements were all over the map, ranging from writing cryptic assembler code for resource constrained digital signal processing chips to writing Java code for business process automation. Yet the same design patterns keep showing up, time and time again, regardless of the project or the language in which we implemented it. And in those rare moments in which I thought to myself "Hey... I've never see this technique before!", I had the wonderful feeling that I was adding yet another useful tool to my toolbox.

The older I get, the more interested I become in some of the really hard problems, like process issues (people are always more challenging to deal with than technology), or in the big picture areas like finance and economics. Some of the perhaps more esoteric stuff on my list to learn include rate monotonic analysis, game theory, and queueing theory. These aren't technologies, but techniques that can be applied to many areas, and for the distant foreseeable future. Many companies require their employees to take so many days of training per year, then offer the most introductory of classes in technologies that are likely to have disappeared in just a couple of years. Unless they are teaching a technology that is going to be imminently used by the employee, their training dollars would be better spent in teaching more fundamental skills and more broadly applicable tools.

Thursday, August 16, 2007

Back in January I wrote an article on the Java Business Integration (JBI) specification, and my experience with a particular implementation of it, Apache's ServiceMix Enterprise Service Bus (ESB). One aspect of JBI really threw me for a loop back when I was in the trenches using it in a product development effort: the behavior of send versus sendSync.

JBI (and ServiceMix) offers two different mechanisms for sending a message across the ESB: send and sendSync. The send call is asynchronous, meaning it's fire and forget. Once the method returns successfully to your application, the only thing you really know is that your message exchange object has entered your delivery channel. This is true even if the message requires a response; the response will arrive later asynchronously on the same delivery channel.

It's like dropping a letter in the mailbox. Although it's likely it will be delivered to the destination, there is no guarantee. Furthermore, there is no guarantee that the order in which you dropped many letters addressed to the same recipient in the same mailbox will be the order in which they will arrive and be read by the recipient. (This is the part that really threw me.)

If all you are doing is sending in product registration cards for the new desktop PC system you just bought, it's no big deal. But if you are mailing deposits and money transfers to your bank, order is suddenly important: if all the transfers arrive and are processed before the deposits, some of the withdrawals associated with the transfers may bounce. That's because the former example is stateless, while the latter example is stateful.

JBI provides a sendSync call which is synchronous, meaning your application blocks on the sendSync call until it is acknowledged by the recipient. The kind of acknowledgement depends on the message exchange pattern (MEP) you are using. For example, if you are sending an InOnly MEP, then all you get back is a indication of Done: you are notified that the recipient received your message and that it actively marked the message exchange as completed. If you are sending an InOut MEP, then you get back a message exchange containing a response from the recipient, and it is you who must then mark the message exchange as completed by sending a Done back to the recipient. You have guaranteed delivery, or at least you know if it didn't work before proceeding, and (more subtly) order is guaranteed to be preserved among successive sendSyncs that you perform to the same recipient.

This sounds simple, but in a system of any complexity at all, it may not be pretty.

The most obvious design issue is that the original sender (the service consumer in JBI parlance) is blocked waiting for the recipient (the service provider) to process the message exchange. On a loaded system, the recipient may be busy processing a long queue of pending requests from many senders. This causes the sender to block for potentially a long period of time. Since the sender may itself be acting as a service provider to other service consumers (that is, they are trying to send it requests and get a response), pending requests can back up in many components throughout the system. Wackiness may ensue. Even if the system isn't busy, handling some requests may require long latency operations like persistence handling, database queries, or remote network accesses.

The other obvious design issue is that if there are circumstances in which the recipient may itself act as a service consumer to the original sender's service provider, that is, the recipient may as part of its processing make a request of the original sender, and both components use sendSync, the system deadlocks. The original sender is waiting on its sendSync as a consumer to the recipient as a provider, and the recipient as a consumer is waiting on yet another sendSync to the original sender as a provider. Neither sendSync will ever complete, or they will timeout if they were coded with a timeout parameter.

This is not a new issue. Developers (like me) old enough to be taking their meals through a straw will recognize a similar issue in the remote procedure call (RPC) paradigm that was fashionable in the 1990s. In RPC, distributed components communicated with one another through function calls that were mapped by frameworks like CORBA, OSF DCE, or SunRPC (I've used 'em all) into network messages. Developers (like me) old enough to have one foot in the grave will remember dealing with this issue when writing bidirectional messaging frameworks using Berkeley sockets and TCP in the 1980s. I dimly recall similar issues arising when writing communications layers for PDP-11s using RS232 serial connections in the 1970s.

Geoff Towell, a colleague of mine in the JBI adventure, remarked that in his experience "systems using synchronous message passing aren't scalable." He also noted that "systems using asynchronous message passing can be difficult to write." In my experience, he was correct on both counts. The fix is the same whether you are using JBI, RPCs, sockets, or serial ports.

To insure guaranteed delivery and preserve order, you use synchronous message passing: sendSync, an RPC call with a returned value, a TCP socket, or a serial protocol that requires a reply. But the response you get back merely indicates that the recipient received your request and queued it for later processing. It says nothing about when the recipient will actually get around to processing your request. When it does, the relative roles of the components will reverse: the original recipient will act as a consumer and perform a sendSync with a new message exchange to the original sender, who is now acting as a provider and will appropriately complete the new message exchange. Hence, message passing is synchronous, but message processing is asynchronous.

We did the same thing with RPCs: the successful return of the remote procedure call merely meant that the called component received the parameters, not that it actually did anything with them. When the called component completed processing the request, it would invoke a callback function in the original calling component, doing another RPC in the opposite direction to deliver the response.

This is why the design of the sender and recipient gets ugly: they may both have to handle multiple requests concurrently. They typically do this by implementing multiple concurrent state machines, each machine implemented as an object. For each request, whether originated by the sender or received by the recipient, a new state machine object is created. Many of these objects may exist simultaneously in both the sender and the recipient as many requests are asynchronously processed. Each state machine is maintained in the recipient until a response for the request that particular state machine represents can be sent, and the recipient's state machine transitions to its final state. The original sender maintains its own state machine for each request until the corresponding response is received and processed, then that state machine also transitions to its final state.

(If you are into automata theory or formal languages, all of this will sound very familiar. The message exchange between the two components describes a protocol. Finite state machines and other automata are typically used to implement parsers for formal languages. Formal languages are described by formal grammars. The fact that you frequently use a state machine to implement a protocol is why protocols are often described in terms of a formal grammar. Such grammars are remarkably useful and should be in every developer's toolbox. But that is a topic for another article.)

There are a number of ways you might implement multiple concurrent state machines. The simplest is to have a separate thread for each request in both the sender and the recipient. This works well in systems in which the cost of context switching and maintaining a thread is zero. The fact that there are no such systems means this approach is seldom used.

(It can in fact get bad a lot faster than you might expect, since on some systems I have seen the cost of context switching increase proportional to the square of the number of active threads. I wrote a marvelous white paper on this topic, that this blog article is too small to contain.)

You may have a fixed size pool of threads in the recipient that service a common queue of incoming requests. I have seen this work well in both Java and C++ implementations in which the number of possible concurrent outstanding requests is small, the lifespan of each request is short, and concurrent pending requests are mostly independent. There are Java and C++ frameworks that provide this capability.

When I've run up against systems that have to handle 38,000 concurrent long-duration requests (and no, I didn't pick that number out of thin air), neither approach scales, and I resort to designing and coding an application-specific concurrent state machine implementation that runs inside a small number (like one) of threads. This is not as hard as it sounds.

(Dan Kegel wrote a really great article on this scalability issue in the context of server-side socket applications in systems of UNIX-like sensibilities; see The C10K Problem.)

My web services friends will no doubt be up in arms over this article, either because I'm suggesting using synchronous message passing, or because I'm suggesting using asynchronous message processing. Probably both. But my background for the past thirty years has been in building robust, scalable server-side real-time systems, and what I have described here is a design pattern I have found to work.

Update (2008-07-07)

I've recently been reading about continuations, which are a mechanism to, very roughly speaking, pick up a prior computation where it left off. It's funny: being the wizened old man that I am, I always thought of continuations as a form of checkpointing, from my time in the deep past with IBM mainframes and Cray supercomputers. It wasn't until recently that I realized that continuations serve essentially the same purpose for web servers as the state machine architecture I describe here and have implemented on several real-time systems over the years. For that matter, checkpoints served a similar purpose on mainframes and supercomputers.

I suspect that the motivation for all of these mechanisms was slightly different. Checkpoints existed because hardware was slow and expensive, it wasn't uncommon for the system to crash, and when it did you wanted to pick up work were it left off. The state machine architecture I describe here was mostly done for scalability, handling tens of thousands of simultaneous connections with just a few threads. Continuations seem to be mostly motivated not just by robustness and scalability, but the state-less nature of RESTful HTTP operations.

Maybe as I read more it will become more clear to me that continuations are completely different. In the spirit of The Sapir-Whorf Hypothesis, I should probably learn a programming language or framework that supports continuations natively.

Tuesday, August 14, 2007

"If all you have is a hammer, everything looks like a nail." - Benard Baruch

I was teaching a class in embedded development at a client's site the other week. We were discussing approaches to fixing an actual bug in their firmware code base. This client uses the IAR Embedded Workbench, an IDE that provides all the usual tools: an editor, a build environment supporting the ARM-based tool chain, a JTAG interface, a down loader, a debugger, etc. When I've done product development for this client, I've used this tool, and while its debugger is the best thing since sliced bread, I found its editor to be a little weak. I preferred instead to do my editing using Eclipse with the CDT and Subversion plug-ins. The IAR EWB and Eclipse played well together, in my experience, allowing me to leverage the best of both worlds.

We discussed for a few minutes the merits of using the debugger to test our hypothesis of what the bug was, where to put in breakpoints, what variables we needed to look at. "Or," someone said, "we could just change the code and see what happens."

I had one of those brief moments of cognitive dissonance where I wondered "Who the heck said that?" and then realized "Oh, it was me!" We were all victims of tool economics.

Back in the 1970s, when I got paid to write code in FORTRAN, COBOL and assembly language, we didn't so much have a tool chain as a tool string or maybe a tool thread. We typed our code onto punch cards using a keypunch, which we frequently had to stand in line for. We submitted them to the operator at the I/O window of the computer room. Our stack of punch cards were queued up with a lot of other similar stacks by the card reader. The operator fed them in, where our job was queued on disk by the mainframe operating system. (This was OS/MFT running HASP on a IBM 360/65, for those that remember when glaciers covered the land and we communicated mostly with grunts.) Eventually the OS would run our jobs and, if all went well, our printout would come banging out on the line printer. The operator would separate our printouts and place them with our original card deck back in the I/O window. On a really good day, you might get two or three runs in, providing you got to the computer center early and left late.

The latency involved in a single iteration of the edit-compile-test-debug cycle bred a very strong offline approach to code development: you looked at your code until your eyes bled, trying as hard as you could to fix as many bugs as possible, running what-if simulations in your head because that was a lot faster that actually running your program. You saved old punch cards, just in case you needed to put those lines back in (maybe even in a different location), and your printouts were heavily annotated with quill pen marks (okay, I exaggerate, they were felt-tip pens). Changes were meticulously planned in advance and discussed with as many other developers as you could lay your hands on. You think you multi-task now? The only way to be productive in those days was to work on many completely different software applications simultaneously, so you could pipeline all of those edit-compile-test-debug cycles. Every new printout arriving at the I/O window was an opportunity to context switch in an effort to remember what the heck you were thinking several hours ago.

Sitting here, writing this while running Eclipse on my laptop and accessing my code base wirelessly from a remote Subversion server, I have zero desire to return to those bad old days. But it does illustrate how our tool chain affects how we approach a problem. Because the developers at my client were focused on using the EWB, their problem solving approach necessarily revolved around debugging, because the debugger was the EWB's greatest strength. My problem solving approach revolved around editing and refactoring with Eclipse, trying an experiment to see what happened, then maybe using Subversion to back my changes out. I resorted to using the debugger only when I had a unit test really go sideways and trap to the RTOS, or when I just couldn't see what was going on without single stepping through the code and watching variables change. Using Eclipse as part of the tool chain made it economical to try new things and verify proposed solutions quickly.

Tool economics affects all phases of software development. It limits us in the solutions we are willing to consider, and forces us to reject some potential solutions out of hand. It colors the processes we use, and the quality we may be able to achieve within the constraints of schedule and budget.

Some of the developers at my client preferred to use the debugger to unit test their code. It was easy to love the debugger in this role, but it was a very labor intensive process. This was okay economically as long as you only had to do this once, when you originally wrote the code.

But no successful code base is static. At least two-thirds of the cost of software development is code maintenance, changing the code base after its initial release. Refactoring, something tools like Eclipse vastly simplify, becomes a way of life. Going through the debugger every time you change a particular function is expensive. I chose instead to write unit tests embedded in the production code base, typically conditionally compiled in so that the test code was present in the development environment but not shipped as part of the product. If I changed a particular function, I just reran the unit test. If the test failed and I was absolutely clueless as to why, then I turned to the debugger. I paid the price of writing the unit test once, and then used it for free henceforth. For sure there are occasionally some Heisenbergian issues with the unit test affecting the functionality of the code base, typically due to memory utilization or real-time, but those were rare.

Embedding unit tests in the code base is just a more economical approach to software development from a developer utilization viewpoint. But the real reason I tend to use that approach is that it's a rare embedded development project I work on in which I have access to a debugger as good as the one provided by the IAR EWB. I'm generally forced into more formal unit testing approach because I really don't have a choice. The fact that it's really cheaper in the long run is just gravy. This is yet another way in which tool availability forces a particular approach.

Tool economics effects how we design our applications too. I remember a job in which I found myself working in an eight million line code base whose oldest elements dated back at least to the early 1980s. Unit testing was extremely expensive. There were no formal unit tests. You tested your change by reserving a lab machine for sometime in the future, loading the application, and spending anywhere from five minutes to hours testing on real hardware and real devices, sometimes in concert with logic and protocol analyzers. I admit that many times I chose a particular design, or even an architectural direction, because it minimized the number of subsystems I would have to touch, and hence re-test. Choosing the "best" solution had nothing to do with it. Choosing a solution that met requirements while at the same time could be completed within the limitations of the schedule had everything to do with it.

You think I was wrong? The managers to which I reported would disagree with you. Admittedly this was sometimes a case of time to market (TTM) taking precedence over cost of goods sold (COGS) (see The Irresistible Force and the Unmovable Object), but that was the trade-off sanctioned by that particular development organization.

The expense of testing also effected the quality of the code base in perhaps unexpected ways. Many times I recall discussing with my fellow developers over beer or lattes that, in the adventure game that was navigating this huge code base, in the course of our work we occasionally encountered code that we knew deep in our gut simply could not work. It could never have worked. What to do? We could have fixed it, but then we would have had to test it, even if it were a trivial change, placing us behind schedule on our actual assignments, and making it more likely we would be in the next wave of layoffs. We could have written and filed a bug report (and too our credit, we typically did just that), but bug reports have a habit of ending up back on the plate of the original submitter. So when we encountered such code, and if time was short, we sometimes just backed away slowly, and pretended never to have seen it. The high cost of testing drove that dysfunctional behavior.

If you are a developer, you deserve the best tools you organization can afford. No matter whether you are in the open source world, the Microsoft camp, or the embedded domain, there are very good tools to be had. The availability of good tools, and your familiarity with them, will affect not only how you approach your job, but the quality of the product that you produce.

If you are a manager, you need to make it a priority to get your developers the best tools you can afford. Your schedule and the quality of the resulting product depends on it. Just meeting requirements is not sufficient, because a requirement must be testable. Having a requirement that says "this software must be maintainable in the face of even unanticipated changes over its entire lifespan" is not testable. Whether you appreciate it or not, your developers are making economic decisions minute by minute that affect both the TTM and the COGS of your product. And, hence, the financials of your company.

Metadata

Chip Overclock® is the pen name of John Sloan, a freelance product developer based in Denver Colorado USA who specializes in very large and very small systems: distributed, real time, high performance, embedded, concurrent, parallel, close to bare metal.