Software Certifications and Standards: What Every Device Manufacturer Should Know

The mandate for certified safe and secure software used to be the
exclusive
domain of military, medical and government or other niche areas. New
regulations are beginning to play a critical role in the viability of
devices manufactured for the global market.

As with many technologies, the military was one of the first
"industries" to mainline the use of computers in both its
infrastructure and weapons systems. One can only imagine some of the
spectacular failures that lead to the development of some of the
military specific standards.

Regardless, they were one of the first to propose a rigor for the
development of software used in military devices. From military
applications, it was a natural evolution for software to move into
civilian applications such as avionics. First used in communication,
diagnostics and guidance systems, software controls systems have moved
into the arena of flight control systems, where fly-by-wire systems
have now been deployed in commercial aircraft. The European Airbus 380
is a perfect example of an aircraft flown entirely by computer; there
are no mechanical linkages between the pilot and the flight control
surfaces.

Medical devices are another area where the safety of software plays
a role in ensuring both operator and patient safety. Programmable
electronic devices are deployed in everything from portable blood
glucose monitors to implanted heart defibrillators. Increasingly,
automobile manufacturers are adding more and more computing power to
their products. The reasons range from safety concerns, to
environmental, to cost.

Engine management software cleans our exhaust, controls the
transmission to insure optimal performance, and anti-locking braking
software maximizes stopping power. In the late 1990s, BMW replaced the
wiring harness used for controlling things like electric door locks,
mirror and window controls with a simple two-wire CAN bus, and as a
result, eliminated over 10 Kg of wiring from the vehicle. Nowadays,
modern luxury vehicles contain upward of 80 or more programmable
electronic devices.

Many automotive manufacturers are toying with the idea of X-By-Wire
systems (steer by wire, break by wire). This is an attractive feature
to add from the standpoint of safety as the steering column has been
removed along with the prospect of impaling the driver who is involved
in an accident. Furthermore, now the manufacturer no longer has to
maintain two versions of the vehicle as the steering wheel and the
glove box can be interchangeable. The dealer can customize the car for
either driving in the US/Europe or the UK/Japan/Australia.

The use of software in the aforementioned devices improves their
functionality and usefulness, but if that software fails, then in some
cases the results are catastrophic. Expensive devices may be ruined,
but worse, there is a potential for loss of life.

Notable software BugsJuly 28, 1962 -
Mariner I space probe. A bug in the flight control software
causes the Mariner I rocket to calculate the incorrect trajectory. The
rocket was destroyed by Mission Control over the Atlantic.

1985-1987-
Therac-25 medical accelerator. A therapeutic device that
utilizes radiation has a bug which can lead to a race condition. If
that condition occurs then the patient receives multiple times the
recommend dosage of radiation. The failure directly caused the deaths
of five patients and harmed many more.

January 15,
1990 - AT&T Network Outage. A bug in a new release of code
causes the switches of AT&T to crash. Over 60 thousand New Yorkers
were left without phone service for nine hours.

June 4, 1996 -
Ariane 5 Flight 501. A bug in the Ariane 5 rocket caused the
engines to over power resulting in such extreme acceleration that it
caused the rocket to rip itself apart.

November 2000
-- National Cancer Institute. Panama City Operators find that
they can trick the software of a therapeutic device that utilizes
radiation for treatment. Despite the legal requirement that all
treatment schedules be rechecked by hand, the device delivers twice the
recommended dosage. Eight patients die and 20 more will undoubtedly be
permanently disabled.

May 2004
Mercedes-Benz - "Sensotronic" braking system - One of the
largest recalls in automotive history; Mercedes-Benz has to recall
680,000 cars due to a failure of its Sensotronic breaking system.

It is interesting to note that in every case, these system failures
occurred in devices whose designers knew in advance the possibly
devastating results that a software failure could cause, and made every
effort to prevent. It is also interesting to note that in the case of
the National Cancer Institute in Panama, even with a supposedly
attentive operator bound by law to recalculate the settings by hand
(but didn't), the device still caused harm.

Safety Standards
From a historical perspective, there are a number of accepted if not
mandated standards that many industries must adhere to: military and
avionics, aerospace, nuclear and power plants, rail and medical. Their
standards provide guidance as to how software (if not the entire
device) is to be designed and deployed. They vary in their rigor,
guidance, application and impact on development, but their goal is the
same; to produce safe and reliable devices.

As a side note, it was pointed out to me that software safety,
software security and software reliability are not one and the same. As
a contrived and trivial example of the difference, a fire suppression
system does not have to be reliable in that it works as one would
expect it to; the goal of safe software is so that if it fails, it
fails in a safe fashion. In the case of a fire suppression system, it
may be that if the software fails, the fire suppression system comes
on.

The two standards to be examined, in reality, view the device, which
in the case of avionics, is the aircraft and in the second case, a
medical device, as a total system. But for this paper, it is just the
software aspects that will be considered.

The first is the Federal Avionics Administration's DO-178B standard.
Titled "Software Considerations in Airborne Systems and Equipment
Certifications," the standard known as DO-178 was first published in
1982 by the Radio Technical Commission for Aeronautics (RTCA).

After two revisions, the current version B was released in 1992. The
standard was developed to establish guidelines on how software is
designed, maintained, implemented and used in aircraft. Basically, it
specifies that every line of code be directly traceable to a
requirement, every test case be traceable to a line of code and every
line of code has a corresponding test case.

The DO-178B standard has five levels of certification, each of which
equates to the potential for harm if the system fails. The lowest is
Level E and the highest is Level A. The potential for harm and the
level of certification are:

*Level A: Where a software
failure would cause and or contribute to a catastrophic failure of the
aircraft flight control systems. * Level B: Where a software
failure would cause and or contribute to a hazardous/severe failure
condition in the flight control systems. * Level C: Where a software
failure would cause and or contribute to a major failure condition in
the flight control systems. * Level D: Where a software
failure would cause and or contribute to a minor failure condition in
the flight controls systems. * Level E: Where a software
failure would have no adverse effect on the aircraft or on pilot
workload.

As an example of the various types of applications and their
potential for causing harm, the in-flight entertainment system may be
considered Level E, while a fly-by-wire system is considered Level A.
As the potential for catastrophic failure increases, so does the amount
of diligence to prevent that potential for catastrophic failure. For
all levels of the standard, almost all of the following "Certification
Artifacts" are required:

The most rigorous aspect of the DO-178B standard is its approach,
quality assurance and testing of the code. That goal is accomplished by
"Functional Analysis" of the software and by "Structural Coverage
Analysis" of the software.

The goal of functional analysis is to show a one-to-one
correspondence between the code that makes up the software and the
requirements (traceability); basically, "this code is here because of
this requirement." The functional analysis tests the software through
boundary testing and other techniques, and demonstrates that it does
what it is supposed to without undefined results.

Statement coverage essentially means that each line of code has been
executed at least once. Decision coverage means that each entry and
exit point has been executed at least once and all possible outcomes
have been executed at least once. Modified Condition Decision Coverage
exercises each entry and exit point at least once and that every
conditional branch has been covered at lease once. Furthermore, each
condition in a decision independently affects the executions outcome.

The amount of structural coverage analysis depends on the level of
certification that is desired and is outlined below:

The DO-178B specification spells out what, and to a large degree,
how a flight system must be designed, implemented, tested and
maintained.

The other extreme to specifying safety in a device is the FDA's
approach. The Food and Drug Administration's (FDA) 510(k) requires that
manufacturers notify the FDA 90 days before they plan to market a
medical device. It is similar to the FAA's DO-178B in that its intent
is to make sure that medical devices are designed and deployed in a
manner that ensures patient and operator safety.

The FDA takes a "kinder, gentler" approach to device design. In
their guidance documents, they state that it is their desire to allow
developers to use a "Least Burdensome" approach. I am not implying that
this particular standard is more lax than the FAA's. The FDA's approach
does not constrain development to be done according to a single
paradigm.

One company could use extreme programming techniques and another
could use the traditional waterfall approach. As long as both companies
adhere to the practices that they document and provide proof of due
diligence, both approaches are fine with the FDA.

Above and beyond the FDA regulations on device development; in the
US, due to the nature of its liability laws, it is in the best interest
of a medical device manufacturer to deliver very safe products.

Converging to Software Control
Historically, operator, plant and stakeholder safety depended on
operator training, physical barriers, mechanical interrupts and
mechanical fail safes and lockouts. As technology evolved, so did the
safety systems. Electrical interrupts and lockouts replaced mechanical
ones, and physical barriers were replaced by beams and light curtains.
The really disruptive aspects of technology occurred when plant systems
that depended on operator control and intervention started becoming
"automated." The machinery began to think for itself.

There are a multitude of reasons for using programmable logic and
electronics in industrial devices. In some cases, it is because the
speed of the plant operation becomes so fast or complicated that a
human can no longer keep up with their task. It could be said that
quality control was better. Computer-based systems don't have bad days,
or end-ofshift fatigue. In reality, the reason for the explosion of
automation can be summed up in two words; cost reduction.

Digital systems are faster, more precise and, over the long haul,
are less expensive than a $35 an hour laborer who has a pension. Like
the BMW example given earlier, it is so much more cost effective to
replace a wiring harness or pneumatic actuators with a single wire or
bus control system. Not only does it reduce the BOM for the system, but
in most cases, the labor involved installation is lower. In large
interconnected systems such as a paper machine, the savings in material
and labor to install it can make the difference between a positive ROI
and a negative ROI.

One of my first jobs as an adult was working as an industrial
electrician at a local paper mill. I pulled many a mile of cable that
year, working with hundreds of others doing the same. At the same time
the instrumentation crews bent and installed thousands of miles of
pneumatic tubing. While there is still a need for the cabling required
to power the thousands of motors that are used in a paper machine, most
of the "one switch, one control cable" and pneumatics can be replaced
with busses, each of which can support many switches and controllers

The mill had a number of processes that were largely performed using
programmable logic elements. At this time the wisdom was that
automation required redundant or an isolated safety system. That way if
the control portion of the bus system went nuts and started a broadcast
storm that caused a process to malfunction, the safety related system
could still put the machine in a safe state. This "separation of church
and state" approach works pretty well, but redundancy is expensive.

Jack Ganssle said recently that the most expensive thing in the
universe is software. That is true, but it is only true because doing
the next alternative (doing it purely with logic circuits) is
prohibitively expensive.

Cultural and Philosophical
Differences
There are several cultural differences between the US and Europe as to
the evolution of safe software standards and the overall acceptance of
them between the two geographical regions.

Europeans in general are used to more regulation in their daily
lives and European governments tend to be more supportive of standards.
European states use standards and certifications as barriers to trade.
The European legal system is somewhat sympathetic to companies who
comply with standards groups as opposed to those who do not comply with
them.

Compliance with standards tends to protect manufacturers against
liability in the event that they produced an unsafe product.
Furthermore, European workers are motivated to adhere to safety
standards as they, as individuals, are likely to be held civilly or
criminally responsible for the products they develop. In fact, it is
the personal responsibility of the chief officers of the company to
make every effort to ensure safe products are developed.

Some European companies take this so far as to have their officers
sign a "Declaration of Conformity" to ensure that the device was
produced in accordance with standards and is in compliance with
national standards.

In the US, rightly or wrongly, acceptance of standards and common
practices, no matter how stringent, does nothing to mitigate a
manufacturer's liability in the eyes of both the law and the jury. With
the exception of those committing gross negligence " for example an
inebriated pilot crashing a plane " an employee will not face civil or
criminal charges as a result of an unsafe product reaching the market.

So, the only reasons for US manufacturers to choose to adhere to a
standard is that they see it as a marketing tool that differentiates
them from their competitors, it is a government regulation or they fear
litigation if a product harms someone.

Do not misunderstand the prior statement. Many US companies do have
internal coding, quality and safety standards that they follow; they
are motivated by the market to produce safe products so that is not the
issue. It is that there is rarely an incentive for them to join and
follow external standards groups.

The Tipping Point
As a product marketing manager, one aspect of my job is to keep a
finger on the pulse of the embedded space. I do a lot of reading, a lot
of talking and most of all, a lot of listening. I read blogs, trade
journals, I talk to a lot of people and to customers of course; I talk
to lost sales and to what essentially amounts to cold calls at trade
shows. Since I am interested both personally and professionally in
industrial automation, as well as safety critical applications such as
avionics, I tend to ask questions pertaining to that aspect of people's
projects.

What I am finding is that strategic thinking of developers and
manufacturers of home, building and industrial automation is split
along geographical lines. My perception of this split started 24 months
ago. IEC 61508 was mentioned during a call with our German sales
office.

I had never heard of it. Neither had any of the US-based customers I
normally spoke with. DO-178B, 510(k), I was familiar with. Over the
next few months, the German office reported more and more interest in
IEC 61508. Then interest arose in France and the UK. I received two
from Japan today.

IEC 61508
A decade ago, the International Electro-technical Commission issued the
final version of its IEC 61508 specification governing the development
of electrical/electronic/programmable electronic safety-related
systems.

The main thrust of IEC 61508 is to provide "guidance" for developing
devices that are functionally safe. In the context of IEC 61508,
functional safety is defined as: "Functional safety is part of the
overall safety that depends on a system or equipment operating
correctly in response to its inputs. Functional safety is achieved when
every specified safety function is carried out and the level of
performance required of each safety function is met."

Basically, the standard strives to ensure that safety systems
perform as specified, and if they fail, they fail in a manner that is
safe. One thing that needs to be (re)emphasized is that when discussing
safety in this context, reliability is not implied, only that if there
is a failure, that it will fail safely.

In many ways, the IEC 61508 standard is very similar to the DO-178B
standard. It is very structured in its approach in developing software.
Unlike the DO-178B standard, the IEC 61508 standard does allow
certification of standalone software. Basically, it allows software
reuse without having to go through the process of recertifying the
entire portion of code that has been previously certified. Of course,
all of the code that can be precertified must be code that is
independent of the hardware.

Even while all specific code such as drivers must be certified, the
ability to pre-certify generic code has a dramatic impact on the
expense of developing safety systems. Since estimates for developing
and certifying code to these standards run upward of $100 per line of
code, this ability to amortize the cost of development over multiple
projects makes these features feasible.

It also makes commercially available, pre-certified, software
attractive as software vendor's business model to amortize their
development costs over many, many sales. An added benefit to this is
that manufacturers can add features such as USB or Ethernet
connectivity at a reasonable price, where before they could not afford
to certify the extra tens of thousands lines of additional code.

Another bright spot for manufacturers is that the standard allows
developers to partition their systems into safe and non-safe feature
sets. When properly implemented, by using MMU hardware, the standard
allows developers to avoid the costly burden of validating the
application code that runs in the partition and does not perform safety
related activities. While not a trivial task in terms of the work
needed to guarantee the non-safe partition can't bring down the safety
related partition, the benefits to the manufacturer and end customer
are immense (when the other options involve the validation process at a
cost of $100s per LOC).

Another major difference between DO-178B and IEC 61508 is that at
its highest level of safety SIL 4, IEC 61508 is stricter in how that
safety is achieved. Just like DO-178B, as one works through the four
levels of failure reduction SIL 1- 4, the degree of functional and
structural analysis is more rigorous. Unlike DO-178B, at its highest
level SIL 4, IEC 61508 calls for redundancy.

Not only does it call for the use of multiple (at least two)
processors, but also through the use of two or more different types of
processors (ARM vs MIPS), with the software written for each processor
by different teams. For more information on hardware redundancies, see
IEC 61508-2. For more information about using different implementation
teams, see: IEC 61508-3 section 7.4.3.2 and IEC 61508-7 Appendix B 1.5,
and C 3.1 " C 3.5.

Opportunities for Cost Reduction
Automation was first introduced to improve quality, efficiency and
productivity. However, some of those gains were offset due to the need
to develop safety systems to deal with automation.

That required redundant systems to monitor the automated systems.
With them came the added expense of isolated busses and control
systems. So expense added up, not only due to the development of the
safety system, but its manufacturer and installation as well.

I think in general we can say that manufactures are developing safe
devices, regardless of their adherence to a safety standard that was
developed in house, or an open standard developed by a committee. It is
infrequent that a truly catastrophic event occurs due to a software
error.

That safety record has come at a relatively high cost when comparing
features and functionality to device counterparts that occupy the
consumer space. The question arises, which way is better; proprietary,
in-house safety standard or use of an open standard such as IEC-61508?

There is some data available on this question. The quantitative
approach used by many safety standards reduces costs by preventing
either over engineering or under engineering. Shell Global Solutions
cut up to 20% from the cost of implementing safety systems. Extensive
investigation showed that about 65% of safety functions are
overengineered while 10% are actually under engineered and represent a
weak link in the overall safety management of the facility. Only 25%
didn't require changes. (exida.com)

The question of "Can
adherence to a safety standard save money?" is answered
positively. Now, what about the question of will adherence make money?
I think the answer to that question is also yes. From my small and
clearly unscientific study of our current and potential customer base,
I can conclude that if one does not begin to plan for utilizing design
and maintenance guidelines that are set forth in standards such as the
IEC-61508, one is effectively writing off a growing segment of the
international market. Will IEC- 61508 go the way of the Dodo bird and
ISO9000? Only time will tell. Right now, it seems it is becoming
established and that momentum is growing.

Security TrendsFIPS 140-2.
On, May 26, 2006, the Federal Information Processing Standard (FIPS)
140-2 "Security Requirements for Cryptographic Modules" took effect.
The standard was developed in conjunction with the NSA and is published
by the National Institute of Standards and Technology (NIST).

It describes the requirements and standards that a hardware and/or
software product must meet to be purchased for government use, for
sensitive but Unclassified (SBU) use. The standard has been adopted by
the Canadian Communications Security Establishment (CSE) as well as the
American National Standards Institute.

In essence, FIPS 140-2 specifies the security requirements provided
by the cryptographic module that is used to protect sensitive but
unclassified information. The standard covers all computer and
communication systems, providing four levels of increasing security:
Level 1, Level 2, Level 3 and Level 4. Many of the devices requiring
adherence to FIPS 140-2 are easy to identify; PC, laptops, printers,
routers, switches, basically anything attached to the network.

Others are not identified so intuitively; things like telephones,
both traditional and IP-based, are covered. What about cell phones? It
is possible, with the advent of combining traditional cell with
VoIP-based services, that the lines are being blurred.

HIPPAA.
To improve the efficiency and effectiveness of the health care system,
the Health Insurance Portability and Accountability Act (HIPAA) of
1996, Public Law 104-191, included "Administrative Simplification"
provisions that required Health and Human Services (HHS) to adopt
national standards for electronic healthcare transactions. At the same
time, Congress recognized that advances in electronic technology could
erode the privacy of health information.

This new U.S. regulation gives patients greater access to their own
medical records and more control over how their personally identifiable
health information is used. The regulation also addresses the
obligations of healthcare providers and health plans to protect health
information.

Conclusions
There are many more software safety standards that exist than the few
that are mentioned in this paper. However, the IEC 61508 standard seems
to be becoming a de facto standard, especially in areas before where
there were either no standards for the industry, or there where no
regulatory reasons to adopt one.

One of the primary reasons is that IEC 61508 is a standard that is
generic in application, but comprehensive in its approach to achieving
safety. Companies that previously utilized proprietary or in-house
standards are adopting IEC 61508 as a marketing tool to prevent them
from being shut out of markets.

Another factor that may drive North American manufacturers to adopt
IEC 61508 is the 2002 Sarbanes-Oxley Act governing the behavior of
corporate management. In the litigious society of North America, it
will only be a matter of time before some enterprising attorney
connects the Sarbanes-Oxley act with an unfortunate software failure.

Furthermore, mandates for security in various segments of
government, healthcare and finance are forcing manufacturers of
infrastructure and office equipment to either conform to expensive
adherence of security standards or to write those markets off entirely.

Because of the rapid convergence of functionality into such things
as cell phones, it is my personal belief that not only will these sorts
of safety and security requirements thrive in the current areas of
acceptance, but they will also grow into other areas. I also believe
that it is better to adopt them now while they provide a marketable
differentiation in a product that will command a premium, rather than
wait until it is just an expected commodity feature of a product that
commands no value.

Todd Brian is a product manager
for Accelerated Technology, an Embedded Systems Division of Mentor Graphics
where he is responsible for kernels and related products.

This article is excerpted from a paper of
the same name presented at
the Embedded Systems Conference Boston 2006. Used with permission of
the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/boston/