Al Williams

Dr. Dobb's Bloggers

The CPU Crawl

May 08, 2013

I want to talk a bit more about Verilog and how it is different from simply writing something using software.

The first example creates subcomponents named and1 and summer. The second set of statements use continuous assignments (which always use =, not <=). These are roughly equivalent and the important thing to note is that the assign statements run forever and always. So when a or b changes, the sum and carry outputs will update no matter what else is going on.

So far, none of these examples captures the real power of a high-level language translator. How about this:

Then you don't need the tmp wires, either. The important point here is that instead of telling the translator exactly what to do, you told it what you wanted to accomplish (I want to add two bits) and let it figure out the best way to do that.

Just as a C compiler will optimize your code, the translator will do things to optimize your design and the more latitude you give it, the better (in general). A smart translator may find ways to collapse multiple things into one or use special resources (e.g., a hardware multiplier) to make your design work better if you allow it to do so.

All of the above are examples of combinatorial logic. The output purely depends on the input and there is no notion of history (in other words, the output of the adder doesn't depend on what the current output is). Circuits that do depend on historical context are known as sequential or synchronous circuits. These play at least two important roles in most FPGA designs. First, some things just naturally need an idea of the current state of things. For example, CARDIAC (or vtach) needs to know what the current value of the accumulator is if it is going to execute an add instruction.

There is another issue that sequential logic solves. It is easy to pretend that our schematics and our Verilog do everything all with zero latency. That's not how it really works, though. Even though hardware latency is much lower than typical software, it still takes a finite amount of time for signals to fly down wires and through logic gates.

Consider the half adder. If both logic gates in the schematic above work instantly, there's no problem. However, what if the AND gate took 2 picoseconds (ps) to respond to an output but the XOR gate (for some reason) took 2,000 picoseconds? The circuit would possibly give the wrong answer for 1,998 picoseconds, since the carry would be from the current inputs but the sum would not change at the same time.

This is why nearly all FPGA designs have clocks and coordinate processing on the clock's edges. If I assume that inputs only change on the clock edge and I won't actually look at the outputs until the next clock, then any discrepancies that occur faster than the clock won't matter. FPGA tools can determine the amount of delay between circuits and compute the fastest clock speed you could use. Slowing the clock down usually isn't a problem.

My version of CARDIAC uses this principle to divide processing time into different phases. In addition to a clock, it divides the clock into multiple phases:

Phase 1: Fetch instruction from memory

Phase 2: Decode instruction and load operands from memory

Phase 3: Perform actual work

Phase 4: Store results

This way, each part of the system has stable inputs by the time it needs them. Phase 2 also updates the bug (the program counter). This forms a very simple state machine. It goes from state 1, to state 2, to state 3, to state 4, and then starts over. However, I used a common FPGA trick known as "one hot encoding" to capture the states.

One way to handle the phase number would be to use two bits to represent 0 to 3. This would be a very common way to do this in software. For hardware, that would equate to two storage elements (called flip flops). However, that means to tell if I am in phase 3, I need to "decode" both bits (basically do a multibit comparison on two bits). For something this simple, that's not a big deal, but with dozens of states it might add up. However, you can choose to represent each state with a bit. So instead of 1, 2, 3, 4 you can select states 1, 2, 4, and 8. Now by simply looking at one bit you can tell if you are in a particular state. Sure, it is two extra flip flops, but the amount of circuitry you save in testing those bits more than makes up for the difference. It is also much simpler to implement the state tracking when using one hot encoding.

To fully understand how this works, you need to look at the vtach memory architecture. The memory gets its address from a multiplexer. The addsource signal picks if the multiplexer supplies the bug (program counter) address or the operand address from the current instruction.

In the first phase (phase 1), the control circuit sets addsource to 0 (selects the bug address) and sets the memory output enable (memoe) to 1 so that the output of the memory will drive the data bus. This is a good example of where the output of the memory probably isn't ready immediately, but it doesn't matter because nothing will read the data bus until the next clock cycle.

The second phase (phase 2) reads the data bus (the contents of the memory) into the instruction register (ir), resets the address multiplexer, and also grabs the incremented bug address (the address to the next instruction) and puts it in the bug. The bugplus1 is a signal (actually, a set of signals) coming from a BCD increment component. This is defined earlier in vtach.v:

Because of this line, bugplus1 always has the incremented value of bug on it. The component's definition is in bcdincr.v. You'd think this would be simple, but because vtach is a decimal machine, it is a bit more complicated then it would be for a binary processor.

Of course, the JMP and TAC instructions may modify the bug later in the third phase (phase 4; remember one hot encoding). That's no problem since the circuit only transfers the incremented value during phase 2.

I want to talk more about how vtach handles states, but that requires a bit more about flip flops, so it will wait until next time. Meanwhile, you can still download the first version of vtach and see how the states interact, the use of assign statements, and find parts that are combinatorial circuits.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!