We have seen in previous posts that design and verification problems fall in the class of computationally complex hard problems. That is, in the worst case, it takes O(2^N) time to optimize/verify a design of size N, where N could represent the number of binary inputs. If we run into this limit, our ability to design large systems is severely limited.

Given these limits, then, how it is possible that we can routinely create designs consisting of millions of lines of code or gates? More importantly, how we can continue to design increasingly larger, more complex designs?

One answer is that we can’t. We can’t find all the bugs in complex designs, but we can hope that there are no critical bugs. We can’t perfectly optimize a design, but we can do a good enough job. Even then, exponential complexity should be a brick wall to increasing complexity, but is not. Somehow, we are able to avoid these limits.

One way to look at this is to consider the way human brains are built and work in creating complex designs. The jumping off point for understanding this is Shannon’s counting argument.

Claude Shannon was one of the most important computer scientists of the twentieth century. He is most famous for his work on information theory. Shannon’s counting argument is one of his less well known results. Shannon was trying to answer the question, what is the minimum number of gates (say two-input NAND gates) needed to be able to implement any function of N binary inputs? The basic answer is that there must exist functions that require an exponential number of gates to implement. The proof goes by counting the number of possible functions of N inputs and counting the number of functions implementable with T gates and showing that T is o(2^N). The argument is as follows:

the number of functions of N inputs is 2^(2^N). For example, a truth table for N=4 would have 16 entries. The number of functions, therefore is 2^16=16,384.

The number of functions of N inputs that can be implemented with T 2-input gates is (N+T)^(2T). Each gate input can connect to the output of any other gate or an input for a total of (N+T) possible connections and there are 2T gate inputs.

Solving for T shows that T >= 2^(N-d) for some small delta d, which implies that T = o(2^N).

What is the significance of this? Today’s chips have as many as 1000 pins. What is the minimum number of gates required on the chip that would allow us to implement any function of a thousand inputs? Is, say, one million gates sufficient? Off the top of your head, you probably would say no. What about a billion? Our preconceived notion is that this should be sufficient. After all, that is right at the limit of today’s most advanced chips and it is inconceivable that we couldn’t implement what we want with that many gates.

Now lets look at what the counting argument has to say about this. The total number of possible functions is 2^(2^1000), which is a vast number, roughly 10^(10^300). A billion 2-input NAND gates would allow us to implement roughly 10^(10^10) different functions with a thousand inputs. This is also a vast number, but is infinitesimal compared to the number of possible functions. The fact is, we are limited to a very small number of functions that can be implemented!

And this picture doesn’t fundamentally change even if we assume we have a trillion gates at our disposal. So, how is it we believe that we can implement whatever functionality we need? To answer this, let’s work backward from the number of gates available and determine the maximum number of inputs for which we can implement any arbitrary function. Let’s choose the most complex system we have at our disposal, the human brain.

The human brain contains on the order of a trillion neurons, each with roughly a thousand connections. Assuming all neurons can be connected to any other neuron (which is not the case), we come up with an upper bound on the number of functions that can be implemented: (10^10)^(1000 * 10^10) = 10^(10^14) = 2^(2^40). In the best case, therefore, the human brain could implement any function of just 40 binary inputs.

Now we can do a thought experiment. What is the largest size truth table that you can mentally make sense of at one time? Two bits is easy. Three bits not so hard. Four bits is difficult to visualize as a truth table, but using visual aids such as Karnaugh maps makes it possible.

Another test would be to look at the placement of arbitrarily placed objects on a plane. Given a glance, how accurately could you replicate the placement of all objects. If there were only one object, I could envision determining placement with an accuracy of 5-6 bits in both X and Y dimensions. More objects with less accuracy. Beyond about seven objects , it is is difficult to even remember exactly how many objects there are. Maybe 10-12 bits accuracy overall is possible.

The bottom line is the human brain has a very low capacity for consciously grasping an arbitrary function. Let’s say its no more that 10 bits in the best case. From the counting argument, we can determine that it requires slightly more than 500 gates to implement any arbitrary function of 10 inputs. With a million gates on a chip, we could put 20000 such functions on the chip.

Now we treat each of these 500 gate blocks as a unit and then consider how to hook these 20000 blocks together to perform some function. This is certainly possible using abstraction. In this case it would probably require two or more levels of abstraction to be able to deal with 20000 blocks.

Abstraction is the essence by which we are able to design very large systems. The need for abstraction is driven by the limitations on our brain in dealing with large functions imposed by Shannon’s counting argument. And it is the structure imposed by multiple layers of abstraction that makes intractable design and verification problems become tractable.

The computational complexity class, NP-hard, is at the core of a number of problems we encounter on a daily basis, from loading the dishwasher (how do I get all these pots to fit?) to packing a car for a vacation, to putting together a child’s train tracks.

If we look at these things, they have several things in common. First, they each involve a potentially large number of parts (pots, luggage, pieces of track) that need to be put together in some way. Second, we want to meet some objective, such as fitting all the dishes in the dishwasher. Third, there are a large number of constraints that must be met. In the case of loading the dishwasher, no two dishes can be put in the same place. There are N^2 constraints just to specify this, among many others. A fourth characteristic is that we may get close to an optimal solution, but find it difficult and not obvious how to get to a more optimal one (just how are we going to fit that last pot in the dishwasher). Furthermore, getting from a near optimal solution to an optimal one may involve a complete rearrangement of all the pieces.

One way to solve problems like packing a dishwasher is to view it as a truth table. Each dish can be put in one of, say, 100 slots, in, say, one of ten different orientations. This results in 1000 combinations, requiring 10 bits. If there are 40 dishes, 4000 bits are required to represent all possible configurations of dishes in the dishwasher. The resulting truth table is vast. Each entry in the table indicates how much space is left in the dishwasher if dishes are put in according to the configuration of that entry. A negative number indicates an infeasible solution. There will be many invalid configurations which have two or more dishes occupying the same location. We give all of these entries a large negative number.

The resulting table describes a landscape that is mostly flat with hills sparsely scattered throughout. We can also imagine that this landscape is an ocean in which negative values are under water and positive values represent islands in the ocean. The goal is to find the highest island in the ocean. We start in some random location in the ocean and start searching. We may find an island quickly, but it may not be the highest one. Given the vastness of the ocean, it is understandable why it can take a very long time to find a solution.

But, wait a minute. What about polynomial algorithms like sorting? A truth table can be constructed for these also. For example, to sort 256 elements, we can create 8 bit variables for each element to describe the position of that element in the sorted list. The value of each entry would indicate the number of sorted elements for that configuration. The complete table would again be around 4000 bits and have vast numbers of infeasible solutions in which two or more elements occupy the same slot in the list and only one satisfying solution. Yet, we know finding a solution is easy. Why is this?

The ocean corresponding to the sorting problem is highly regular. If we are put down in an arbitrary point in the ocean, we can immediately determine where to go just be examining the current truth table entry (point in the ocean). Knowing the structure, we may be able to determine from this that we need to go, say, northeast for 1000 miles. We may have to do this some number (but polynomial) times before getting to the solution, but is guaranteed to get to the solution. Structure in a problem allows us to eliminate large parts of the search space efficiently.

In contrast, for an NP-hard problem, there is no guarantee of structure. Furthermore, as we are sailing around this ocean, we are doing so in a thick fog such that we can only see what is immediately around us. We could sail right by an island and not even know it. Given this, it is easy to see that it could take an exponential amount of time to find a solution.

But then, how do we account for the fact that, often, NP-hard problems are tractable? The answer to this question is that there usually is some amount of structure in most problems. We can use heuristics to look for certain patterns. If we find these patterns, then this gives guidance similar to the sorting example above. The problem is that different designs have different patterns and there is no one heuristic that works in all cases. Tools that deal with NP-hard problems usually use many heuristics. The trouble is that, the more heuristics there are, the slower the search. At each step, each of the heuristics needs to be invoked until a pattern match is found. In the worst case, no pattern match will be found meaning it will take an exponential time to do the search, but the search will be much slower due to the overhead of invoking the heuristics at each step.

I hope this gives some intuition into NP-hard problems. In future posts I will talk about even harder classes of problem.

Abstraction is the single most important tool in designing complex systems. There is simply no way to design a million lines of code, whether it be hardware or software, without using multiple levels of abstraction. But, what exactly is abstraction? Most designers know intuitively that, for example, a high-level programming language ,such as C, is a higher level of abstraction than assembly language. Equivalently, in hardware, RTL is a higher level abstraction than gate-level. However, few designers understand the theoretical basis for abstraction. If we believe that the solution to designing ever more complex systems is higher levels of abstraction, then it is important to understand the basic theory of what makes one description of a design more or less abstract than another.

There are four types of abstraction that are used in building hardware/software systems:

structural

behavioral

data

temporal

Structural Abstraction

Structure refers to the concrete objects that make up a system and their composition For example, the concrete objects that make up a chip are gates. If we write at the RTL level of abstraction:

a = b + c;

this is describing an adder, but the details of all the gates and their connections is suppressed because they are not relevant at this level of description. In software, the concrete objects being hidden are the CPU registers, program counter, stack pointer, etc. For example, in a high-level language, a function call looks like:

foo(a,b,c);

The equivalent machine-level code will have instructions to push and pop operands and jump to the specified subroutine. The high-level language hides these irrelevant details.

In general, structural abstraction means specifying functions in terms of inputs and outputs only. Structural abstraction is the most fundamental type of abstraction used in design. It is what enables a designer to enter large designs.

Behavioral Abstraction

Abstracting behavior means not specifying what should happen for certain inputs and/or states. Behavioral abstraction can really only be applied to functions that have been structurally abstracted. Structural abstraction means that a function is specified by a table mapping inputs to outputs. Behavioral abstraction means that the table is not completely filled in.

Behavioral abstraction is not used in design, but is extremely useful, in fact, necessary, in verification. Verification engineers instinctively use behavioral abstraction without even realizing it. A verification environment consists of two parts: a generator that generates input stimulus, and a checker, which checks that the output is correct. It is very common for checkers not to be able to check the output for all possible input values. For example, it is common to find code such as:

The checker only specifies the correct behavior if a response is received. It says nothing about the correct behavior if no response is received.

A directed test is an extreme example of behavioral abstraction. Suppose I write the following directed test for an adder:

a = 2;
b = 2;
dut_adder(out,a,b);
if (out != 4)
print("ERROR");

The checker is the last two lines, but it only specifies the output for inputs, a=2, b=2, and says nothing about any other input values.

Data Abstraction

Data abstraction is a mapping from a lower-level type to a higher-level type. The most obvious data abstraction, which is common to both hardware and software, is the mapping of an N-bit vector onto the set of integers. Other data abstractions exist. In hardware, a binary digit is an abstraction of the analog values that exist on a signal. In software, a struct is an abstraction of its individual members.

An interesting fact about data abstraction is that the single most important abstraction, from bit vector to integer, is not actually a valid abstraction. When we treat values as integers, we expect that they obey the rules or arithmetic, however, fixed with bit vectors do not, specifically when operations overflow. To avoid this, a bit width is chosen such that no overflow is possible, or special overflow handling is done.

Temporal Abstraction

This last abstraction type really only applies to hardware. Temporal abstraction means ignoring how long it takes to perform a function. A simple example of this is the zero-delay gate model often used in gate-level simulations. RTL also assumes all combinational operations take zero time.

It is also possible to abstract cycles. For example, a pipelined processor requires several cycles to complete an operation. In verification, it is common to create an unpipelined model of the processor that completes all operations in one cycle. At the end of a sequence of operations, the architecturally visible state of the two models should be the same. This is useful because an unpipelined model is usually much simpler to write than a pipelined one.

The four abstractions described above comprise a basis for the majority of abstractions used in design and verification. That is, any abstraction we are likely to encounter is some combination of the above abstractions. However, we are still left with the question of what is a valid abstraction. I will defer answering this until the next post.

(note: this discussion is based on the paper, “Abstraction Mechanisms for Hardware Verification” by Tom Melham. There is slightly more high-level description of these abstractions in this paper along with a lot of boring technical detail that you probably want to skip.)

I am a big fan of Fred Brooks’ “The Mythical Man-Month: Essays on Software Engineering”. Brooks was leader of one of the first large software projects. Along the way, he found that a lot of the conventional wisdom about software engineering was wrong, most famously coming up with the idea that adding manpower to a late project makes it later.

I have also found that a lot of the conventional wisdom about the causes of verification difficulties is wrong. So, in designing this blog, I decided to model it after the Mythical Man-Month. The essence of this style is:

short, easy-to-read essays on a single topic.

timelessness – focus on overarching issues, not on how to solve specific problems with specific programming languages.

back it up with real data whenever possible.

In some senses, this blog will attempt to fill in some obvious holes in the Mythical Man-Month. Brooks states that project effort can be roughly apportioned as:

1/3 planning (specification)

1/6 coding

1/2 verification

but then proceeds to talk mostly about planning and coding and very little about verification. I think this blog will cover each of these areas in rough proportion to these numbers, so most of my posts will be on verification, but a fair number will cover specification and some on particular design issues.

Brooks’ work is more than 30 years old, so it is worth re-examining some of his conclusions to see if they still hold up as design complexity has increased with time. One of the areas of contention is the percentage of time spent in verification. Brooks’ claim of verification taking 50% of the effort applied to large software projects. Today, there are claims that hardware verification is taking 70% of the effort. EDA vendors often point to this “growth” as proof that verification is becoming the bottleneck.

But, is this the real story? Software verification is mostly done by the designers. In this kind of environment, verification consumes roughly 50% of the total effort. 20 years ago, Hardware verification was also roughly 50% of the effort because it was mostly the designers doing the verification. The shift to pre-silicon verifcation that came about due to the advent of HDLs and synthesis enabled the separation of verification and design. But, separate verification is not as efficient as having the designer do the verification. So, now verification is 70% of the effort instead of 50%. So, rather than growing from 50% to 70% of more, it was more of a one time jump due to the shift in methodology. But, whether it is 50% or 70%, verification is the largest single piece of the overall design effort.

I wrote an article addressing this subject in more detail titled, “Leveraging Design Insight for Intelligent Verification Methodologies”. You can download it from the Nusym website