A blog about recent (and not-so-recent) happenings in Information Technology and what it means to you and me.

Monday, November 16, 2015

Does Intel Pentium Bug of 1990s Still Holds Any Lessons for us?

Information
Technology, at its core, is a forward oriented profession. What i mean by this assertion is that, as a general observation, the
rate of change that this profession deals with is unprecedented. In my career time span so far, i have seen many a paradigm shifts including rise
and fall and re-rise of Microsoft, birth and dominance of Google, a gigantic
comeback of Apple, and all these eventually impacting our lives as
professionals and as consumers of technology. In dealing with such acute
dynamism, in my belief, it is very easy to lose the sense of history of our
profession. I personally feel that having a good sense of history for a chosen
profession often helps us connect the dots better and better fathom the current
events that we experience. History
helps connect things through time and I do consider knowledge of the history
of our profession important in shaping its future. Most of the today's
methodologies and good practices are evolved by bettering what didn't work in
the past. To say the least, sense of history also gives as a sense of
connection with the past which we should look to not lose.

I was recently reading
the book- "Only the Paranoid Survive", the first person account of
Andy Grove (former CEO of Intel) on how he dealt with strategic inflection
points i.e. the time in the life of a business when its fundamentals are
about to change. One of the narration in the book talks about the Pentium chip
bug and it goes like as follows (written as is it appears in the book)-

"Several weeks
earlier, some of our employees had found a string of comments on the Internet
forum where people interested in Intel products congregate. The comments were
under the headings like, "Bug in the Pentium FPU." (FPU stands for
floating point unit, the part of the chip that does heavy-duty math.) They were
triggered by the observation of a math professor that something wasn't quite
right with the mathematical capabilities of the pentium chip. The professor
reported that he had encountered a division error while studying some complex
math problem.

We were
already familiar with this problem, having encountered it several
months earlier. It was due to a minor design error on the chip, which caused a
rounding error in division once every nine billion times. At first, we were
very concerned about this, so we mounted a major study to try to understand
what once every nine billion divisions would mean. We found the results
reassuring. For instance, they meant that an average spreadsheet user would run
into the problem only once every 27,000 years of spreadsheet use.

Andy spends a quite a
few pages later in the book to tell why this bug was critical and how it turned
his thinking around some peculiar was happening in the world around him. Let me
summarize that point of view in next few points and also explain its relevance
in today's world-

1. The beginning of
social media as a force to reckon with:

Internet Forum in 1990s

We
pretty much take social media for granted these days. It generates a lot of
data and opinions every passing second, which is very valuable to those who see
the need to seek information out of it. This is especially true for
anyone seeking feedback for a

newly launched product or a service. Consumers,
on the other hand, provide feedback often without being asked on social media.
It more often turns out to a medium for venting out imperfections and bad
experiences. This is now. But when we talk about 1990s, when the Pentium FPU
bug occurred, things were still in infancy w.r.t Internet and people had
started to use Internet forums to share opinions. Intel, then was not in the
business of selling the computer chips directly to consumers. It used to sell
via PC manufacturers like IBM. Intel's emergence was at the cusp of PC industry
turning more horizontal oriented than vertically oriented meaning that earlier
one manufacturer like Digital used to manufacture/assemble all parts of a
computer (vertical orientation), later each key component became individual
business (horizontal orientation) serving the PC assembler like IBM, Dell etc.
Andy mentions in his book that with this bug, he smelled something unusual
happen in the field. And it was that though he was not selling directly to
consumers, he was getting feedback from them directly. He inferred that if this
situation wasn't handled with proactive stance, then he could receive a lot of
negative backlash. Mind you, this was 1990s, when it was hard to imagine the
power of social media. Andy took the corrective actions quickly and even
justified the huge cost of this bug- around USD 450 million (mammoth amount now
but more so 20 years back).

It stands the lessons
for today's times too. Proactively dealing with feedbackreceived on social media
is the order of the day. It is easy to manufacture negativity even by bad
intentions of the competitors. The birth of techniques such as Sentiment Analysis that
help to proactive assess positive and negative sentiments around the events
like product releases further help to deal with negative perceptions well. In
my recent memory, i am reminded on the social buzz that was created by the
security vulnerability in SSL- Heartbleed
bug and the negative response generated in the social media when the
news about their (hidden) social
experiment A/B test leaked out publically where they subjected a
certain percentage of their consumers to negative news deliberately. Even
though social media as a channel is quite useful to generate feedback but it
also makes companies vulnerable to negative publicity in the event of bugs that
catch public attention.

2. Handling strategic
inflection points need different skills

In the
wake of negative press and crisis-like situation that the Pentium FPU bug
generated for Intel, Andy made a very interesting observation in his book. He
says-

"A lot of
people involved in handling this stuff had only joined Intel in the last ten
years or so, during which time our business had grown steadily. Their
experience had been that working hard, putting one foot in front of other, was
what it took to get good outcome. Now, all of a sudden, instead of predictable
success, nothing was predictable. Our people, while they were busting their
butts, were also perturbed and even scared."

In short, the skills
needed to handle peace time in business are quite different from the ones needed
during war time. People often come to work believing the workplaces
to be fair i.e. if i do "X" amount of work, i will
get equivalent of "X" credit. While there is nothing wrong
in this assumption generally but such thinking (from employee's perspective) do
not take into account changing business situations. The reality of today's
times is that an effort that would have resulted in a great output (for company and personally) in a certain business situation would not just be
enough in a very different business situation. This often happens because of no
fault of employee, who did his best given the current situation but probably
lacked situational awareness to alter the nature of efforts. To quickly explain
this perspective, Nokia's example comes to mind. The story of rise and further
decline of Nokia is widely written about. During good times (till atleast
2007), the company made a big fortunes with its existing model (with its phones
based on Symbian OS). But when the time came to change to more modern mobile OS
like Android, they just failed to move swiftly. I can imagine the employees in
this situation would have put in great efforts with their key skills around
Symbian OS but due to situational change, the same efforts which bore huge
fruits earlier were just not enough to reap similar or greater rewards.

3. Lessons in Defect
Advocacy

To me,
the most interesting part of the narration regarding Pentium FPU bug was this-
"an average spreadsheet user would run into the problem only once every
27,000 years of spreadsheet use"

This was actually a
known problem before the Pentium chip was released. What might have happened
is, following the usual defect prioritization principles, it would have been
given acknowledged but given less priority as the frequency of this bug happen
was staggering 27000 years of spreadsheet use. Now, one may question this
data's accuracy, which is probably a fair question but larger point that this
case teaches is that the usual defect prioritization approach usually fail to
consider the macro aspects impacting the product. Let me explain this point a
little bit-

Pentium chip was
released at the backdrop of the legendary "Intel Inside" marketing
campaign. The extent of popularity (due to marketing efforts) of this campaign
was so huge that Intel almost became a household brand. When people started
seeing the effects of the error related to this, they put the blame squarely on
Intel and not the computer manufacturer. The early social media in the form of
Internet forums gave voice to their concerns. Had the defect prioritization
decision, take into account the macro environment that the product will operate
under, it would probably have been chosen to be fixed.

One of the key
learnings here that is still relevant in today's times is to have a holistic
approach towards defect advocacy. A tester advocating the defect should relate
the bug information with the macro environment happenings like business
situation, popularity of the bug, users impacted and much more. For a tester to
be playing the role of the headlights of the product, he/she should not just
think about internals of the bugs but also associate it with the necessary
business information and related factors.

What else do you learn
from this case? Please do share your thoughts in the comments.