Entymology 101: The Second Law of Verification

One day during the MCU project as tapeout neared, a bug was found. This bug was ultimately the last bug found before tapeout and required a simple fix. After fixing this bug, I, along with a number of other people, wondered why we hadn’t found this bug earlier. The bug was a very simple, of the nature “function FOO doesn’t work” with maybe one or two other simple qualifying conditions that should have been exercised during directed testing.

There was another bug, similar to this last one, but with more complex triggering conditions. This other bug was the second one found during the verification process.

If I asked you to explain the first law of verification, which says that the bug rate is higher at the beginning of verification and lower at the end, you would have a simple explanation. The easier bugs are found at the beginning and the harder ones at the end.

conventional correlation between bug rate and bug hardness

But these two bugs seem to violate this explanation. It quickly became clear that whether or not this really is a contradiction depends heavily on how one defines “hardness” with respect to bugs. Certainly, if one defines hardness to mean the amount of effort it actually required to find a bug, then hardness generally increases with time and there is no contradication. But, normally, when we think of a bug as being hard to find, we think of this hardness as being an intrinsic property of the bug, not an accident of when it was discovered. Does the fact that I tested function FOO after function BAR mean that bugs in FOO are inherently harder than those in BAR? Of course not.

One concrete way of trying to define the intrinsic hardness of a bug is to consider the probability of triggering the bug assuming that all input values are generated randomly. This particular measure is routinely used by researchers to demonstrate that formal verification can find bugs that are “impossible to find” using simulation. First, lets deal with that last phrase. Simulation mimics exactly the functioning of a chip. If a bug cannot be found by simulation then, by definition, it’s not a bug! What is usually meant is “the probability of finding this bug in simulation is extremely low”.

In order to understand bug hardness better, let’s take the example of verifying a simple pipelined processor. One of the subtleties in designing pipelines is handling “forwarding” cases. Forwarding occurs when two consecutive stages in the pipeline are processing requests for the same address. When consecutive stages are working on requests for the same address, the latter stage must forward its result to the earlier stage since the data has not yet made its way back to memory.

Verification engineer A is not familiar with pipelined architecture and so is not aware of the forwarding issue. He decides that injecting many requests with randomly generated addresses will result in thorough verification. The probability of triggering a forwarding case on a per-request basis is approximately one in four billion, assuming a 32-bit address. The test he writes injects a million requests which he considers sufficient to cover all possibilities. . The probability, then, of exercising a forwarding case during this test is roughly one in four thousand, making it extremely unlikely he will uncover any bugs in the forwarding logic.

Verification engineer B understands pipelines and forwarding. He decides to write two tests, one specifically to test the forwarding logic and another to test non-forwarding cases. The forwarding test injects multiple requests with the same address back-to-back. The second test is basically the same as engineer A’s test which injects requests with randomly-generated addresses. Engineer B’s forwarding test will detect whether the design engineer knew to put forwarding logic in or not. If the logic is missing, the forwarding test will fail no matter what address is chosen for the requests. If the designer has added the forwarding logic, it is still possible that it may fail for specific address ranges that the verification engineer is not aware of. So, it is still possible that engineer B’s test may miss forwarding bugs. However, if the designer is not aware of the forwarding issue, engineer B’s tests will detect this with high probability.

In this example, the hardness of a forwarding bug is different for engineers A and B; in engineer A’s case, forwarding bugs are extremely different to find while for engineer B, simple forwarding bugs are easy to find. So, if the forwarding logic was missing, is this an intrinsically hard or easy bug? We can’t claim either way. The only thing we can say is that the bug “hardness” depends on who is doing the verification. This is such an important concept that I call this the second law of verification:

Second law of verification: bug hardness is subjective.

This insight that bug hardness is subjective explained this last, simple bug phenomenon. If someone else had done the verification, or if we had used different methods to do our verification, or even if the assignment of engineers to tasks had been different, it is likely that this last bug would have been found earlier. This is not to say that the whole verification effort would have been completed earlier, The effect of a using different verification plan (assuming the same resources) is a simple change in the order in which bugs are found. This is a direct result of the first law of verification. Thus, if our last, simple bug had been found earlier, some other bug would end up being the last bug found.

There are many implications of the notion that bug hardness is subjective, which I will elaborate on in future posts. But, before leaving this post, I want to show that this subjectivity is not just theoretical.

In Ken McMillan’s thesis, he describes, in detail, a bug that was found by his formal verification tool, SMV. The system he was verifying was the Encore Gigamax, a CC-NUMA (Cache Coherent Non-Uniform Memory Architecture) multiprocessor. He calculated the probabilty of finding the bug using random simulation as roughly 10^-15. As it happens, the MCU design I was working at HAL was also a CC-NUMA multiprocessor. It was easy for me to understand the scenario that caused the bug. It was an example of what we called a “crossing” case. Crossing cases are to CC-NUMA multiprocessors what forwarding is to pipelined processors; basically they are cases in which multiple processors make requests for the same address simultaneously. The Gigamax bug occurred when an address is owned by a remote processor and the local processor and a third remote processor simultaneously issue read commands to this address. Apparently, the Gigamax architects didn’t know about crossing cases or their verification engineers didn’t know about crossing cases. However, in our case, this particular crossing case was well understood and was considered one of the simpler cases. Our verification test plan explicitly included this scenario and, in fact, it was one of the first directed tests we wrote. If this case had not been handled correctly in our design, we would have considered it a very basic bug and our probability of finding it was near certainty!

Advertisements

Like this:

LikeLoading...

Related

One Comment

a. Some bugs can not be detected in simulation. I can ask one designer to make a counter up to say 1023 and this counter should indicate an overflow if it would need to go to 1024 which it can’t. It also has a reset input which makes the counter go to 0.
I integrate this counter in my design to serve as a delay counter which I reset every 128 cycles.
If my designer implemented the overflow badly, then if the use of the counter in the system respects the design requirements, I’ll never see the bug in the counter.
However, if the delay controller does not reset the counter properly, there might be some rare case where the overflow does not work properly.
Something similar happened in a design I had to verify and one could say that it kind of happened for the first Ariane5 they had to blow up.
b. I’ld prefer an objective metric for ‘bug hardness’ and another that takes into account the effectiveness of the method used which could be ‘effective/actual’ bug hardness.