Back to Basics – Laying the Foundation

This is the first in a series of articles that are meant to provide an introduction and overview of computer architecture. The knowledge presented should be sufficient to allow readers to fully understand and critically analyze other articles so they can draw their own conclusions.

All Good Things Come in Threes…

The first aspect of a processor to consider is the Instruction Set Architecture (ISA). The ISA defines everything about the types of instructions that can be executed natively by the processor. Instructions are the atomic units of execution for most modern processors, so everything a computer does is eventually rendered into instructions by a compiler. The compiler acts both as a translator, compiling code into assembly language, and as an optimizer, rearranging code to run faster.

There are generally speaking, three different types of instructions: scalar, vector and control. Scalar operations are simple operations that take scalars or regular numbers as inputs. One way to represent scalar operations is to write them as (R0, , R1, R2), where R indicates the operand is in a register (instead of memory) and the subscript indicates the address and is a valid operation, such as addition or division, and R0 is where the result is stored. So for the previous example, (R0,
, R1, R2) translates to: R0 = R1 R2

Scalar instructions typically have two inputs and one output. However, for efficiency instructions will sometimes overwrite an operand, thus reducing the number of data locations needed. This can be illustrated by instructions that look like (R0, , R0, R1), where the result of R0 + R1is actually written over R0. This is called two operand format, and is supported by IBM’s 360/70 and x86. Alternatively, a three operand format could be implemented, where the result is written to a different location than the source operands. ISAs such as Power and Alpha support this mode. Note that the (R0, , R1, R2) notation can capture both.

Vector operations come in several types, but the main distinguishing factor is that they operate on vectors, rather than scalars. A vector can be thought of as a set of N scalars, i.e. V0 = (R0, R1, R2…, RN), so then you would have operations looking like this (V0, , V1, V2). Let’s suppose that V1 and V2 are 32 entry long vectors, which is of medium size, as far as true vector machines go. To add them, the Nth entry in V1 is added to the Nth entry in V2. So (V0,
, V1, V2) really looks like this:

Where R1N indicates the Nth entry in the first vector, V1 and R2N would be defined similarly as the Nth entry in the second vector. One of the advantages of this is that a single vector instruction accomplishes a great deal more than a single scalar instruction. Moreover, when vectors access memory, they tend to do so in a regular way, first asking for the initial entry of a vector, then the second entry and so on. This makes it easy to predict what values are needed next; hence the code will run faster. There are other advantages to this, but they will be covered later.

While an architecture can be implemented with scalar instructions or with vector instructions or both, all architectures must employ control instructions. Control instructions are those that deal with the flow of instructions, rather than direct manipulation of data. The classic example of a control instruction would be a branch. There are generally three types of control instructions:

Conditional Branches

Unconditional Branches

Procedure calls

The distinction between conditional and unconditional branches, is that an unconditional branch is always taken; so it always modifies the program counter (PC), which points to the next instruction. So, in some sense, an unconditional branch is just a conditional branch that is always followed.

A conditional branch would be of the form:

IF (CONDITION) GOTO (MEMORY ADDRESS) &#9;

Whereas an unconditional branch looks like this:

GOTO (MEMORY ADDRESS)

The unconditional looks easier to deal with, and sometimes it is, particularly when the compiler knows the memory address before hand, but often times it does not.

Unfortunately, procedures are a bit more complicated. A procedure is essentially a sequence of instructions that is unbroken and can be sequentially executed, from a programming point of view. So a procedure call is when a CPU is running a procedure and then it needs to, for whatever reason, call another procedure. One example would be if you are just summing N numbers and you need to call the “count” procedure to figure out exactly what the Nth number is. Then the processor has to switch to this other procedure in the instruction stream, but unlike a branch, it must also store where it came from. A procedure return is when you return to the original instructions you were working with, in the example, this was summation. However, not all procedure calls generate returns. Modifying the example, suppose we know that if the Nth number that we are summing is prime and, then the sum works out to 96, then we could bypass returning to the summing procedure and simply return the result 96 to the appropriate location in memory.

One important thing to understand about control instructions is how they modify the code sequence. The address of the next instruction is kept in a location called the program counter (PC), and after every instruction is loaded, the PC is incremented by however many bytes a single instruction is. Followed branches change the PC by something other than the normal increment, whereas an untaken branch increments the PC regularly.