Jack's Rules of Thumb

Engineering is more practical art than science. As such, its most succesful practitioners obey simple rules like these.

Performance anxiety strikes in many ways. I remember starting engineering school and being overwhelmed by the curriculum, the physics, and the vast amount of math I'd somehow have to master. It seemed impossible as a freshman. Engineering courses looked even worse. I flipped through a third-year transistor theory book and another on electromagnetics. Fear crept up my spine. How would I ever master this stuff? And even if I tricked the profs into giving me passing grades, I couldn't imagine understanding it all enough to be a practicing engineer.

I went to my dad, a mechanical engineer who designed spacecraft, for insight. After he got over the initial shock that any of his five kids would come to him for advice, he told me that engineering is a practical art. Sure, we'd use some math on the job, but academia's detailed analysis of fundamental engineering concepts was just to give us insight; it wasn't the way real engineers built things.

For example, civil engineers rarely analyze loads in small structures. Instead they use handbooks, vast matrices of tables that show, for instance, what standard beam to select to support a floor of a particular size. No doubt the engineer could painfully re-derive such data, but in practice they do not.

That's when I learned the importance of "rules of thumb." These are the basis for the design of most real-world products. They're never cast in stone, and are always subject to exceptions and revision, both by more detailed analysis and better experience. But the rules form a reasonable first order approximation to the truth. We know * is about three, for instance. That's a pretty good estimate for some needs. Not for all, but it's a mental guide to the magnitude of the truth.

When we learned to use slide rules, we mastered another way of approximating the truth. Since these calculating devices were so crude, all engineers learned to first run a rough calculation in their heads. If the slide rule said 314, our mental computation scaled it to 3.14, 3140, or whatever result made the most sense for the problem at hand. To this day I have an infuriating (to others at least) habit of checking numbers for sense. "The paper said she swam the English channel in three hours. It's 26 miles across, so she averaged about seven knots. Sounds awfully fast to me."

"You said the budget is $2 trillion? Divide by 280 million Americans and that suggests a total tax load of almost $7,000 for every man, woman and child. Can't be so." (Except it is!)

Since then, I've developed many rules of thumb for understanding embedded systems. Some came from my own painful experience, others from watching developers, and still more from the experience of others from whom I shamelessly steal. These rules guide me in making sense of projects, in checking to be sure we're doing the right sort of things, and in looking for problem areas in need of optimization.

So here they are, with explanations. Please send me yours; I'll share some with the Embedded Systems Programming community and steal the rest.

DIs slip into code in two fashions. Forward-thinking developers recognize that certain actions in a program are inherently non-reentrant. Accessing a shared variable (a global) is fraught with danger since an interrupt may create a context switch to another task that also requires the same variable. So we often place a quick DI/EI pair around the code that uses the global to inhibit such a switch. Reentrancy problems disappear when interrupts are off.

All shared resources are subject to reentrancy problems. A complex peripheral might have dozens or even hundreds of registers; a context switch while setting these up can cause total brain-freeze of the device. Again, DI/EIs can preserve the integrity of the system.

But these DI/EI pairs slip into code in great numbers when there's a systemic design problem that yields lots of critical regions susceptible to reentrancy problems. You know how it is: chasing a bug, the intrepid developer uncovers a variable trashed by context switching. Pop in a quick DI/EI pair. Then there's another. And another. It's like a heroin user taking his last hit. It never ends.

Disabling interrupts tends to be A Bad Thing in general, because even in the best of cases, it'll increase system latency and probably decrease performance. Increased latency leads to missed interrupts and mismanaged devices.

It's best to avoid shared resources whenever possible. Eliminate globals. Create drivers for hardware devices. Encapsulate to excess. Use semaphores and let a well-designed RTOS manage the interrupt headaches. An occasional DI/EI isn't too bad, but the presence of many means we've let chaos creep into the code.

The enable interrupt instruction, too, brings perils and opportunities. An EI located outside an interrupt service routine (ISR) often suggests problems-with the exception of the initial EI in the startup code.

Most interrupt-driven systems leave interrupts on more or less all of the time. EIs indicate someone, somewhere, turned them off, which suggests something very complex and difficult to manage is going on. When the enable is not part of a DI/EI pair (and these two instructions must be very close to each other to keep latency down and maintainability up), then the code is likely a convoluted, cryptic well; plumbing these depths will age the most eager of developers.

Leave interrupts on, for all but the briefest times and in the most compelling of needs. Don't create difficult blocks of code where they're off and reenabled in some other place.

Rule of Thumb: Be wary of solo EIs.

Follow the ISR design rules in most textbooks and you'll violate another one of my rules of thumb. The classic service routine pushes registers like mad, services the interrupting hardware, does something useful, pops ad nauseam, issues an EI to enable interrupts, and returns. Sometimes that makes a lot of sense. More often it doesn't.

One of our ISR goals should be to minimize latency (for more info check out my September 2001 column, "Shared Perceptions," on p. 97) to ensure the system does not miss interrupts. It's perfectly fine to allow another device to interrupt an ISR, or even to allow the same interrupt to do so, given enough stack space. That suggests we should create service routines that do all of the non-reentrant stuff (like servicing hardware) early, issue the EI, and continue with the reentrant activities. Then pop registers and return.

Rule of Thumb: Check the design of any ISR that reenables interrupts immediately before returning.

What's the practical limit to an ISR's size? You'd be amazed at how many products are nothing but one giant ISR. The main loop idles until an interrupt fires off a ten-thousand-line service routine. This can work, but leads to nightmarish debugging struggles. Few tools work well in interrupt routines. Single stepping becomes problematic. Keep ISRs small. If they need to do something complicated, spawn off a task that runs with interrupts enabled. If you're clever enough to produce very short interrupt handlers, you can generally debug them by inspection-which is a lot easier than using an ICE or BDM.

Rule of Thumb: Be wary of ISRs longer than half a page of code.

Nine to five

Sirens go off whenever I hear a developer say, "I can't get anything done during normal working hours." A long story about how he comes in early or stays late-or both-inevitably follows.

If you can't get your job done inside normal working hours, you're being interrupted too often. Change your environment, not your hours. Crazy time-shifting destroys important non-work relationships and crashes your personal life. Will your tombstone read "Brought the XYZ project in on time," or "Gave of himself always, loved by everyone"?

A tenant of eXtreme Programming is we never work two more-than-40-hour workweeks in a row. There's a lot to love and hate about XP, but this rule expresses obvious truisms about people: we need lives. We get tired. Rested people are productive people.

To keep to a 40 hour workweek we have to get interruptions under control. It takes 15 minutes, on average, for your brain to move from active perception of the busy-ness around you to being totally and productively engaged in the cyberworld of coding. Yet a mere 11 minutes passes between interruptions for the average developer. Ever wonder why firmware costs so much? E-mail, the phone, people looking for coffee filters, and your boss all clamor for attention. If you do not manage these interruptions you cannot be productive.

DeMarco and Lister claim a 300% difference in productivity between software teams interrupted often and those who aren't.[1] 300%! Clearly, we have to manage our interruptions; the alternative is missed schedules.

Most companies sentence developers to cubicles rather than private offices. Dilbert aptly terms cubicles "antiproductivity pods." Cubes are concentration vampires. Who can think when you can't block out the sound of your neighbor's call to his divorce lawyer?

Figure out when your brain is most effective. For me, it's first thing in the morning. Take control of these hours. Turn off the e-mail, cut the phone cord, blanket the PA system with headphones, and pull a curtain across the opening that masquerades as a door. Schedule meetings for some other time. Guard these precious hours and use them to focus on your project. It's astonishing how much work you'll accomplish.

Rule of Thumb: Developers who live in cubicles probably aren't very productive. Check how they manage interruptions.

Fear of editing

We all have a feel for another rule of thumb: a little bit of the code causes most of the problems. We've all had that bit of code that breaks every time someone changes nothing more than a comment. Fear of editing is a symptom of this problem.

Five percent of the functions consume 80% of debugging time. I've observed that most projects wallow in the debug cycle, which often accounts for half of the entire schedule. If we can do something about those few functions that represent most of our troubles, the project will get out the door that much sooner.

Barry Boehm observed that these few functions cost four times as much as any other function.[2] That suggests it's much cheaper to toss the junk and recode than to fight the never-ending stream of bugs. Maybe we blew it when first writing the code. If we can identify the crummy routines, toss them out, and start over, we'll save big bucks.

Rule of Thumb: When the developers are afraid to change a function, it's time to rewrite that code from scratch.

Estimations

Isn't it amazing how badly we estimate schedules? Eighty percent of embedded systems are delivered late. Most pundits figure the average project consumes twice the development effort originally budgeted.

Scheduling disasters are inevitable when developers don't separate calendar time from engineering hours. When I see people nervously sliding triangles around in Microsoft Project, I know their project is doomed. Hours and dates are largely unrelated parameters.

Some data suggests the average developer is only about 55% engaged on new product work. Other routine activities-from handling paperwork to talking about Survivor-burn almost half the work week. This is an interesting number, since it correlates so well to the observation that so many projects need twice the estimated time.

Rule of Thumb: Estimating dates instead of hours guarantees a late project. If the schedule hallucinates a people-utilization factor of much over 50% the project will be behind proportionately.

I collect these rules and use them to identify poor development practices. Without them, with no rules at all, no template, no best practices quidelines, one is forced to reinvent everything at every step of the way. That's working far too hard.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com.