Thursday, January 12, 2017

My favorite quote from chapter 1, page 1 of http://dl.acm.org/citation.cfm?id=2742705 includes "'If you can fix a hardware bug in firmware, it’s not a bug but a documentation issue.' —An anonymous hardware manager." I still recall being aghast upon hearing that utterance early in my career, but over time I grew to understand the wisdom. In fact, this inaugural chapter from 2015 book further elaborates on work-around's of hardware concerns that can be implemented in firmware.

"There could be confusion between the root cause and the fix
for issues. In particular, there is great pressure to resolve
hardware issues in firmware due to the order-of-magnitudes
difference in the cost of resolution. A “spin” of a chip may
take many weeks and cost millions of dollars whereas a
firmware fix may cost a few thousand dollars and a day or two
of total work. In fact, many modern chips are designed so the
firmware can configure the chips to work around issues in the
field rather than having the hardware recalled. The public is
simply told that there is a firmware issue when, in fact, the
firmware resolved what was a hardware issue. "

I also recall being given a harried call on a Friday night about a the number of 'firmware bugs' on a product. I replied to the caller with some of the sentiments above, including the caveat 'I think that we can get the firmware update out by Monday, but I don't believe we can get a new stepping of the SOC by then.' Regrettably programs don't roll up the reason for the firmware changes and people are just shocked by the cardinality.

And often these bugs are not easy to find Bohrbugs, but Heisenbugs or Hindenbugs where there is a subtle interaction between host firmware and opaque hardware state machines. A week of investigation may yield a solution wherein one line of code changing one bit in a register access yields a solution. A few months ago a firmware manager at a conference related to me that the promotion process in his company entailed upper management reviewing the Git commits of the engineers. The firmware manager had to defend the smaller number of code changes versus the OS kernel guys as each delivering similar business value. But I still recall the final quote from the manager when he smiled and said 'The coolest part is that they are reviewing code changes, right?'

Sometimes these work-around's mitigate errata in the hardware (board, silicon, etc), and sometimes the changes are for cost savings. An example of the latter I recall includes a board design where the integrated Super I/O could be decoded as I/O addresses 0x2E/0x2F or 0x4E/0x4F by application of a pull-down or pull-up resistor, respectively. The hardware design engineer omitted the resistor in order to save costs on the Bill of Materials (BOM), so the boot firmware had to probe for what port value being decoded on each machine restart. And this was a cost savings of a single surface mount resistor.

The above discourse isn't meant to be an apologist view about firmware bugs, though. In general there are many bugs based upon programming flaws, not just hardware work-around's. If one is interested in the latter, take a look athttps://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues. But hopefully this posting will provide an alternate view into firmware and firmware changes.