Toyota Underestimated 'Deadly' Risks

SAN JOSE — A software expert whose testimony led to a guilty verdict against Toyota Motors in one of a series of runaway acceleration accidents said Tuesday that the best assurance for preventing similar "deadly" outbreaks must be stronger, smarter oversight by federal regulators.

Michael Barr, co-founder and CTO of the Barr Group, told an audience of embedded system engineers at the EE Live! conference here that as automobile manufacturers have pushed each other into a race to fit cars with complex electronic control systems, watchdogs at the National Highway Traffic Safety Administration (NHTSA) have failed to keep pace. Lacking a team of experienced experts to test and monitor today's flood of automotive software designs, NHTSA is failing in its mission to oversee "safety-critical systems."

Despite assurances by companies like Toyota that their software undergoes rigorous testing, said Barr, the rush to get cars on the road means that "You, the users, have been testing the software."

In some cases, like that of Jean Bookout, who was seriously injured when her 2005 Toyota Camry accelerated unintentionally, that sort of ad hoc consumer testing can result in catastrophe. A passenger in the Bookout car, Barbara Schwarz, was killed. After Barr testified at length for the plaintiffs -- in the only software-focused Toyota case that has been tried -- an Oklahoma City jury agreed to award $3 million to Ms. Bookout and to Ms. Schwarz's family.

Commitment to a culture of safety Although insisting on tighter NHTSA regulation, Barr did not absolve carmakers, whose current passion has been described as turning every new car model into a giant, apps-loaded smartphone.

Barr said that Toyota, and by implication other auto companies eager to load their products with electronic controls, lack a "mature design process, done right, documented, and peer reviewed."

He called for carmakers -- regardless of the government's role -- to adopt a "company culture and an engineering culture of wanting to know what can go wrong, and wanting to fix what can go wrong, from the outset," rather than after-the-fact with apologies and million-dollar settlements.

Since the problem of "unintended acceleration" in Toyotas burst into headlines after a ghastly California crash that killed Mark Saylor, a 19-year California Highway Patrol veteran, and three family members, Toyota has recalled millions of cars and paid billions in penalties and settlements. Among these was a $1.2 billion criminal fine imposed last month by the Department of Justice -- for lying to government regulators.

Using an exhaustive 56-slide PowerPoint presentation and citing his 18 months examining Toyota's automotive software "source code," Barr convinced the Oklahoma jury that Toyota had deployed dangerously flawed software in its cars. Despite Barr's findings, Toyota continues to claim that all its unintended acceleration problems were mechanical, the result of misplaced floor mats and "sticky" gas pedals.

Neither NHTSA, with its absence of software expertise, nor the NASA Engineering and Safety Center -- to which NHTSA turned to study the Toyota problem -- were able to pinpoint a software cause for unintended acceleration. Nor were they able to rule out the possibility.

The NASA researchers, who were both on a deadline and not allowed to study Toyota's source code, simply ran out of time, noted Barr.

Under court order, a team from the Barr Group was allowed into a specially built "code room" provided by Toyota. They were able to pinpoint at least one anomaly that could have caused Toyota accelerators to build up speed while disabling the brake system. Barr also found numerous Toyota violations of software design standards. Toyota, in many instances, even broke its own rules for safe design and system redundancy.

Patriot missiles, Therac-25, and others that failed Many of these rules, and Toyota's subsequent actions, were either buried in corporate secrecy or covered over by corporate denial. "The answer is not to say it can't be the software, stick our heads in the sand," said Barr. If companies like Toyota examined themselves more rigorously, he added, and allowed "less code confidentiality," they wouldn't require as much regulatory scrutiny.

Barr cited past cases of "safety-critical systems" that failed but then were corrected when regulators stepped up their intensity and capabilities. After a series of radiation overexposures -- including two fatalities -- caused by a software glitch in a radiotherapy machine called the Therac-25, the Food and Drug Administration created an in-house team of software engineers to review every electronic medical device before its approval for use on patients.

In the case of the Therac-25, in the case of a software-misguided Patriot missile that killed 28 US troops during the Gulf war, and in Toyota's case, the companies responsible have invariably issued assurances about their exhaustive testing and cited "no other instances of similar damage."

Such assurances disregard the bugs that exist in every complicated system and the harm they can cause. "If you are overconfident of your software in a safety-critical system, that could be deadly," said Barr.

" Resistance measurements between all combinations of external APP sensor connector pins detected an intermittent resistive short between VPA1 and VPA2– Measurements made using multiple multimeters– Initially, ~3.5M , dropping to ~5k , and then remaining between 238 to 250 , until the pedal assembly was mechanically shocked– Mechanical shock to the pedal assembly returned the resistance to ~3.5M and further pedal actuations dropped the resistance again to ~5k and finally to the range between 238 to 250– This shorting resistance remained unchanged throughout the entire range of travel of the pedal, except when mechanical shocks were delivered"

And this from the Summary:

• A tin whisker induced short was responsible for the failure of a 2003 Toyota Camry Accelerator Pedal Position (APP) Sensor based on a Dual Potentiometer Design– NHTSA report states warranty analyses identified at least two additional failures due to tin whiskers in similarly designed APP sensors

elctrnx_lyf's comments don't take into account the tendancy of governments to shade things in favor of constituencies and power brokers whose interests are not in line with the end user.

He says: "the real challenge lies in the hands of governments to make sure the automobiles are really safe and they are tested perfectly."

After spending nearly four decades building things that are 'really safe' and 'tested perfectly' with and without a part of the US Government looking over my shoulder I beg to differ - I don't think a Government, be it stamped with US, UK, PRC or any other set of initials is capable of doing this job. It's up to the designers, developers, implementers and testers to work this out. The base is a good set of design rules followed by attention to these rules throughout the process ending in reviews and testing, testing, testing. The reviews must pay attention to the top level design rules - violation of which is an automatic fail. After that it's up to the testers to find all the ways a system can fail - a nearly impossible task in a complex system.

If you are testing with the expectation that the system is working correctly you are working with a black bag over your head - systems are bound to fail and it's the tester's role to find those failure modes. Problem is, I'm not sure we can find those modes once a system reaches a critical level of complexity.

This brings up the question of how to make systems simpler and, perhaps, more reliable. I don't know how to do this but one approach would be to use known building blocks (I know how a brick works, I think) to construct a more complex system the can be tested from the component level to the system level and chasing down ALL anomalies.

Perhaps this is a recipe for paralysis by analysis - we all know that things get built despite the questions - but when it comes to running a system that is supposed to protect our personal or loved one's wetware we can't afford to rely on 'trust me, I'm an engineer' anymore.

Two simple words for exhaustive software testing: code coverage. Evaluate software on drive-by-wire software to the same standards as does the FAA.

One possible way forward would be to isolate engine management into a stand-alone box which could be networked to the transmission controller and other systems in the vehicle. Make it run on a safety-certified RTOS and certified to a given SIL

Automakers build (or buy) incredibly reliable under-hood electronics which operate successfully in enviromnets equally hostile to those specified for mil-spec parts. Give them regulatory cover to develop software to the same level of reliability.

You cannot anticipate all failure modes, but you can use a wealth of tools to find holes in the software and provide mechanisms to recover. So you can run code such as Purify, Coverity etc to spot basic software engineering flaws, you can use code practices that make the code quality higher, and you can run long simulations with focussed and randomised inputs to see if the expected happens.

One way that more expensive safety critical systems survive is to actually have 3 independently designed systems taking the same inputs and voting on the outcomes - it is unlikely that any 2 will have the same bug (unless the bug was in the original spec). If this is too expensive to implement in a car then do this at the simulation stage - have the real software and at least one other version and run them for a long time with lots of input data and spot when they disagree - that could indicate a bug.

For recovery in a real car there is no real excuse not to have watchdog timers - as I understand it in the Toyota case the software on one processor crashed and caused acceleration and the disablement of the brakes. The cures here are:

1) make sure the brakes are not on the same processor as the acceleration, and go through such a simple system that there is no way that pushing the pedal does not slow the car down - ie fail-safe. By all means put in all sorts of fancy stuff to make braking better, but ensure that it will always happen when the user presses the pedal (eg if the user presses the pedal hard for more than one second - apply the brakes no matter what the rest of the software is thinking)

2) use a watchdog timer to spot when a processor has crashed or deadlocked or livelocked, and then reset the processor. This may be non-ideal but would at least mean that a fraction of a second after the software malfunctioned it would be back on line. Of course the software would then need to be designed to expect the possibility that when it awoke the car would be travelling at speed, rather than at the kerbside

"Finally, humans are notoriously bad at assessing risk. Even NASA, the world leaders in safety-critical systems blew it with Challenger and Columbia."

Actually humans can be pretty good at accessing risk when they stick to the science. During early years of the shuttle program, I had friends working on it at Rockwell International, one of the main contractors. They had told me that the predicted rate of shuttle failure to be about 1 in 100 flights or 1%. Since this was not acceptable, the failure rate was continuously recalculated over the years until it was officially about 1 in 100,000 by the time the shuttle started flying. While the calculations changed, the shuttle remained basically the same. In total there were 135 shuttle flights with 2 lost, which is a 1.5% failure rate. Those early, unbiased risk estimates turned out to be pretty good.

@Bert22306 Your point is well taken. Pulling the keys out of the ignition to thwart run away Toyota acceleration would lock the steering wheel and (from the GM experience) disable the airbags. The GM ignition switch issue should not lock the steering wheel (unless the key continued past ACC to LOCK). It would, however, disable the power steering and make steering a little more difficult.

Dr Quine, just to be scrupulously accurate about this, the GM ignition key problem does not lock the steering. What happens is that if your keychain is too heavy, in these small GM cars, and you hit a big bump or such, the key may move to the ACC position. Not to the locked steering column position.

GM is saying that for the time being, until they get their parts in dealerships, people should remove all items from the ignition key, including the fob thingie. That would prevent the key from exerting any counterclockwise torque on the ignition switch.

Given the extraordinary effort that has been expended to identify the root cause of the observed Toyota accidents (for which the failure mode is known with 20-20 hindsight), I would say it seems unlikely that proactive testing could anticipate such failures. We also know that the hardware induced accidents represent a very small fraction of the total automobile accidents each year - most can be attributed to driver error. That said, certainly we want the cars to support drivers in their efforts to be safe. The nightmare scenario of a car thwarting the driver's efforts to drive safely is one we all want to avoid. Perhaps simple solutions are the most reliable: a means to ensure that brakes override accelerators. Ironically, the solution we heard as children (pull the keys out of the ignition) has an unintended side effect as illustrated by the GM ignition recalls: it locks the steering and disables the air bag safety devices when the car subsequently crashes.