Embedded Software Can Kill But Are We Designing Safely?

A survey of embedded design practices leads to some disturbing inferences about safety.

If you were the user of a safety-critical embedded device and learned that the designers had not followed best practices and safety standards in the design of the device, how worried would you be? I know I would be anxious, and quite frankly, the results of the Barr Group's recent annual Embedded Systems Safety & Security Survey indicate that we all need to be concerned.

We just completed analyzing the data from over 2,400 responses, all from engineers who are currently working on embedded device designs. With such a large sample of engineers from all over the world (46 percent from North America and 33 percent from Europe), we were excited to learn more about the design philosophies and practices of engineers, as it relates to safety and security. We shared many of the results last week at Embedded World in Germany, and we will be sharing more results at our upcoming free webinar on March 8, providing more details (such as geographic preferences and distributions by experience level).

But, there are some troubling trends that should not wait—and that everyone should stop and ponder now.

Would it surprise you to know that 22 percent of our respondents are currently working on device designs that can kill? We asked what was the worst thing that could happen if the device you are designing today were to malfunction in the field and more than 500 respondents said one or more people could die! Many of these respondents are in the industrial automation, medical device, automotive, and aerospace/defense industries.

It’s not unexpected that these industries create devices that are safety-critical, but with such a large response, we wanted to know whether these designers were following safety standards and following best practices for reliability and maintainability. IEC, FDA, FAA, NHTSA, SAE, IEEE, MISRA, and other professional agencies and societies work to create safety standards for engineering design. With these standards in place, my hope was that the affirmative responses would be close to 100 percent.

Unfortunately, that was not the case.

Only 67 percent are designing to relevant safety standards, while 22 percent stated that they are not—and 11 percent did not even know if they were designing to a standard or not. Let’s contemplate that for a moment. If we take this at face value, that means that approximately one out of every three safety-critical device designs has potential safety, reliability, security, or quality holes that are not being adequately addressed or vetted. This is quite disturbing.

Let’s go a little deeper on this. Industry safety-standards compliance can be costly and time consuming, but what about other best practices that are part of good design, such as use of coding standards, code reviews, and static analysis? The news here is disconcerting too. For this group of engineers designing devices that can kill, the following graphic tells a compelling story:

Coding practices among developers working on products with a deadly failure mode.

Why are these numbers not near 100 percent utilization?

As we presented these results at Embedded World, I saw many reactions—primarily surprise and skepticism. There was concern that these results could be true and questions about whether our numbers were wrong.

The skeptics had many comments. Some wondered whether our numbers were skewed because not all of our respondents were software engineers and might not know the status of best practices on software development. We do not think that is an issue because of the demographics of the data (including the fact that just six percent of respondents were involved only in hardware).

Some also wondered if there was geographic skew to the data, but again, because we had a broad response from North America, Europe, and Asia, we believe the numbers are a good approximation of engineers’ thoughts.

Some wondered if designers of non-safety-critical subsystems within a safety-critical device (e.g., the satellite radio within an automobile) might be affecting the results. But, as we have seen, with today’s interconnected devices and security challenges, even non-critical subsystems can affect other subsystems.

Our results urge us all to address this situation. Managers need to understand the importance of safety and security and that it needs to be baked into project schedules and budgets.

The fact is that we all need to acknowledge in this age of IoT that our devices are becoming more critical to the infrastructure of the world. We all must devote the time, resources, and dollars to improving reliability. If we do so, in the long run, lives will be saved (and, for those business managers out there, money will be saved too).

Andrew Girson, Barr Group co-founder and CEO, has over 20 years of experience in the embedded systems industry, first as a senior embedded software engineer and subsequently in executive roles as a CTO, VP of Sales and Marketing, and CEO. He has led multiple companies to multi-year double-digit revenue and profitability growth rates while maintaining a distinctly technical focus on high-quality embedded, wireless, and handheld systems. Girson holds BS and MS degrees in Electrical Engineering from the University of Virginia.