If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Giving concise and informative answers to difficult technical questions is
an art in itself, and there is such a wealth of people here with that talent.
Basic technical questions always attracts a lot of readers because not
everybody has the time or patience to read chapter after chapter from books.
In electronics, I have learned, one must have a minimal knowledge abouteverything to understand more in depth about something. Also, having
some general idea about a subject always helps in further researching
about it in books and online references.

You will find some threads over on the DEC forum where Marty bought a book on building your own PDP-8 from TTL devices. The book also had a section on microprogramming a PDP-8.

Marty actually built a PDP-8 out of TTL chips and I helped him to debug it. As part of the debugging, I constructed a PDP-8 simulator in a free tool called LOGISIM.

I also designed my own PDP-8 microcode and constructed a simulator in LOGISIM for that as well.

We tested both implementations using the DEC diagnostics (so we were sure that the instruction set did actually do what it stated on the tin) and I was also able to play the simulations at chess (using CHEKMOII).

Current processors use a internal like a bit slice to actually run the processor. This is because it is what is called pipelines. Any single instruction executes over several cycles in an overlapped manor. Different parts of the instructions execute at different times. At any one time there are several instructions executing at the same time. This is why current computers have branch predictors. Taking a speculative path that is wrong may cost a dozen or more cycles to refill the pipe stages.
Imaging the pipe stages as like a manufacturing assembly line. Each step does on part of the work.
At each stage, there are still decoders like Chuck mentioned but they are no longer looking at the original instruction that has been spread into several functional instructions. Thing like checking caches for information and data dependencies for future branches are all happening at the same time.
Dwight

1- Is is possible that the program line#2 gets executed (partially) before or
at the same time that program line#1 is being executed?

Pipelining can have several implementations; straight pipelining executes instructions in-order (eZ80 does this); superscalar pipelining can issue more than one instruction for execution at a time (as long as the instructions' effects don't interact; in Z80 code, for instance, if a theoretical superscalar Z80 were ever built, you could have the code:

Code:

LD HL,2100H
LD DE,2200H
LD BC,00FFH
LDIR

with a core that can issue three instructions at a time, the LD HL, LD DE, and LD BC instructions could easily be done at the same time; the LDIR, on the other hand, will stall the pipeline.

Some cores can issue instructions out-of-order (when that's possible). So the core, depending on the instruction mix prior to the above code segment, could decide to execute the LD BC prior to the LD HL. The LDIR, again, will stall the pipeline since it depends on the other three instructions executing first.

2- Which instructions are more amenable to pipeline method?

It's easier to answer which instructions are not amenable. Branches/conditional jumps are good examples. A great simple example of straight pipelining without any superscalar or out-of-order execution funniness is the eZ80, which successfully pipelines essentially the entire Z80 instruction set.

3- Is pipeline method more applied to CISC or RISC ?

It is easier to pipeline RISC. CISC can be pipelined, but it is more complex, and the pipeline design becomes more difficult.

1. Yes--much depends on dependency of operands and destination; instruction scheduling can get to be quite a complicated affair.
2. Instructions that take more than a single cycle to execute are good candidates.
3. No preference, but more registers makes scheduling easier.

Any instruction can be divided into several phases; for example: (1) Read operands, (2) operate, (3) store result for the simplest case.
Clearly an instruction can't store a new result if the old value was still needed by an earlier instruction. Things can get interesting when there are multiple functional units that perform the same function. For example, two multipliers or two load-store units. It's all a ballet.

For early examples of this sort of thing, study the IBM 7030 "Stretch" (Gene Amdahl) and CDC 6600 (Seymour Cray).

Consider the following pseudo code to move a block of data from one place to another:

Note how each instruction seems to depend on the result of the previous one. But, in fact, you can move the "Decrement count" after the "Load word" and gain a cycle or so because the Load is a multi-cycle instruction. For that matter, the address increment can be issued at any time, so long as the result isn't stored until after the load or store reads its operands. Further, the condition for the jump instruction can be evaluated long before the bottom of the loop, so you can begin reading instructions for the next iteration (or not) before the current iteration has completed.

The idea is seeing how many things can be done, at least partially, ahead of time.