Toyota Case: Vehicle Testing Confirms Fatal Flaws

MADISON, Wis. — Among the hundreds of cases brought by individuals across the United States claiming their Toyota vehicles accelerated without warning, only Bookout v. Toyota Motor, tried in Oklahoma County, Okla., resulted in a verdict against Toyota. This was also one of the first unintended acceleration cases to go to trial since the Japanese carmaker began recalling millions of vehicles in 2009 over this very issue.

The Oklahoma case was also the first in which plaintiffs' attorneys put the fault squarely on a flaw in the vehicle's electronic throttle control system. They dismissed arguments about floor mats and sticky pedals and focused on the software that controls the electronic throttle. The attorneys supported their argument with extensive testimony from embedded systems experts.

Similar testimony and extensive software analysis reports had been filed previously in other courts looking into unintended acceleration. But none of that material became public, because Toyota paid settlements and obtained gag orders before those cases went to trial. The public and the engineering community had to wait until the Oklahoma trial, where all testimony became public.

A dozen embedded systems experts were allowed to review Toyota's electronic throttle source code in a secure room in Maryland -- described as the size of a small hotel room. The room, with a guard at the door, was disconnected from the Internet. No cellphones, paper, belts, or watches were allowed inside. The experts viewed Toyota's code on five computers in cubicles.

Having spent more than 18 months going in and out of the secure room to study Toyota's code, Michael Barr, CTO of the Barr Group, put together an 800-page report analyzing the 2005 Camry L4's software. On the witness stand, he walked a jury step by step through what the experts discovered in their source-code review. According to Barr's testimony, that review revealed:

A multifunction kitchen-sink Task X designed to execute everything from throttle control to cruise control and many of the fail-safes

That all Task X functions, including fail-safes, are designed to run on the main CPU in the Camry's electronic control module

That the brake override that is supposed to save the day when there is an unintended acceleration is also in Task X

The use of an operating system in which there is no protection against hardware or software faults

A number of other problems

Barr testified that the source-code review indicated "both that task could die by the memory corruption, and that also that one of side effects of that would be that this -- for example, that task died, that many of fail safes would be disabled." But is it possible to prove that the experts' discoveries in that cloak-and-dagger source-code room would manifest themselves in a moving vehicle? How do we know how a car might react to malfunctions or an outright failure in Task X?

The plaintiffs' attorneys noted that they actually conducted vehicle testing. Though Barr wasn't present when the vehicles were tested, he testified that his group's simulations in the source-code room were tested by a gentleman named Mr. Louden, using 2008 and 2005 Camry vehicles. The purpose was to perform the same testing and demonstration (originally done in the source-code room) to determine what the fail-safes would do in a vehicle in response to task death.

Excerpts of the court transcript
EE Times is publishing a portion of the court transcript relevant to vehicle testing. The following Q&A was carried out when Benjamin E. Baker, Jr., representing the plaintiffs, called Barr to the stand.

Seems like Toyota engineers are not aware of fault-proof design basics. Well developed in 60-s and 70-s, redundancy and fault-proof reliability is standard in high fault cost areas like avionics or nuclear station control but is almost forgotten in gadget-oriented main stream electronics. Some comments below illustrate it even more: with forgotten general principles, companies and engineers create some home-brew and "common sense" based recipies

Running the break override routine on the same main processor as part of the "kitchensink" firmware is either incredibly irresponsible or shows total ignorance regarding the basics of real time software. Not even a rookie sw engineer would do this in the US. And this is the firmware of the best selling car in the US, probably one of the best selling cars of the world. It will be many many years before I would consider buying a Toyota, even though I had two of them in the past 30 years and was reasonably satisfied with both.

If you go back to my original post, we always use asserts on critical data on function entry and always use asserts on returned data, and those asserts stay in the delivered code.

Asserts are fine within a single task flow, but they do not protect adjacent tasks that can be corrupted by bad behavior between asserts. Hardware protection protects against cross infection, and ECC would have helped avoid the root cause (if the root cause was a bit flip).

It comes down to having layered defenses, both for stability, but also for intrusion and modification protection.

Naturally, software assertions use a different mechanism than MPUs, ECCs, WDTs, and other such hardware. But, still I think it is very beneficial to view all these mechanisms as complementary aspects of the **same** basic method.

This basic method is to intentionally introduce redundancy checks (either software-based or hardware-based) to ensure that the system operates as intended.

The problem with viewing software assertions as "another thing all together" than MPUs, ECCs, WDTs, etc. is that redundancy checks that are very easy to perform in software, but difficult in hardware, are not being done.

Too often this mindset leads to gaping security holes and sub-optimal designs. I believe that it is exactly what could have saved the day in the Toyota UA case. Please note that even if ECC was used, it would not detect memory corruption due to the alleged stack overflow or an array index out of bounds. Simple software assertions, on the other hand, would have easily detect such things.

So I repeat the main point of my original post. Software assertions are no less important than MPUs, ECCs, WDTs, etc. Unfortuantely, they are routinely under-utilized or disabled in the production code. I just hope that we could use the Toyota case to change this perception.

Client side MPUs actually prevent resource access, read, write, or both, on chip select or address or even register level granularity. The access permission is granted based on VMID characteristcs that are driven as part of the bus cycle. The VMID characteristics are steered at the bus master by various attributes of the access, including (possibly) Task ID running on the core.

If a carved out RAM region, or a set of device registers, are reserved for a particular VMID, that is associated with a particular TASK, then other tasks are prevented from accessing those resources even if the processor would otherwise be taking a legitimate action.

It prevents against software defects, it prevents against directed attacks on the system.

It is particularly useful in multicore systems with shared resources.

This is very different from the behavior of the CPU tied MMU.

In one of our current SOCS, which contains eight 32bit CPUs and eigth 32bit DSPs, there are roughly sixty client port MPUs to provide protection domains to the individual device and memory space that is shared between all sixteen cores.

So even if your assert becomes corrupted because of a bit flip or some other data failure that occurs outside the domain of the assert, the end point will block the access.

Each of these capabilities form a layered protection scheme. MPUs alone are not sufficient, MMUs alone are not sufficient, ASSERTS alone, are not sufficient, ECC alone is not sufficient. Together they provide a layered protection that provide defense in depth.

@Wobbly: I still fail to see why an MPU-detected failure is "another thing all together" than a failing software assertion. For example, an assertion might check for an array index out of bounds. Why is such a failure so fundamentally different than an attempt to de-reference a NULL pointer, which might trip the MPU?

1) Correctable error, which is completely allowable. That is why you use ECC, though ECC events should be tracked and thresholded. On a car, for example, ECC events that cross a threshold could trip the ECU lamp for a service error. Note, the threshold would not be a total count, but a count per unit runtime. You need to filter them over time.

I am really surprised that nobody so far mentioned the use of simple software assertions.

Most people point out that ECC or MPU were not used. But these layers of protection are really nothing else than hardware-assisted assertions. I mean, what do you do when your ECC detects a parity error or your MPU detects an unauthorized memory access? Well, you execute an exception handler, which puts your system in a fail-safe state (typically a reset).

This is exactly what simple software assertions do too, except that software assertions can easily catch subtle logic errors that no hardware can detect.

So here comes my main point. Too often I see software assertions **disabled** in the production code. Interestingly, this is done by the same people, who advocate the use of ECCs or MPUs. Isn't this a bit inconsistent? How many readers of this article ship products with assertions enabled?