Microprocessor Design/Wire Wrap

Historically, most of the early CPUs were built by attaching integrated circuits (ICs) to circuit boards and wiring them up.

Nowadays, it's much faster to design and implement a new CPU in a FPGA -- the result will probably run faster and use less power than anything spread out over multiple ICs.

However, some people still design and build CPUs the old-fashioned way.
Such a CPU is sometimes called a "home brew CPU" or a "home built CPU".

Some people feel that physically constructing a CPU in this way, since it allows students to probe the inner workings of the CPU, it helps them "Touch the magic"[1], helps them learn and understand the underlying electronics and hardware.

A homebrew CPU is a central processing unit constructed using a number of simple integrated circuits, usually from the 7400 Series. When planning such a CPU, the designer must not only consider the hardware of the device but also the instructions the CPU will have, how they will operate, the bit patterns for each one, and their mnemonics. Before the existence of computer based circuit simulation, many commercial processors from manufacturers such as Motorola were first constructed and tested using discrete logic.
Those commercial processors include
the Motorola 6800,[2]
the Motorola 6809,[3]
and
the Hewlett-Packard PA-RISC TS1.[4]

Although no limit exists on data bus sizes when constructing such a CPU, the number of components required to complete a design increases exponentially as bus size gets wider. Common physical data bus sizes are 1-bit, 4-bits, 8-bits, and 16-bits.
Incomplete design documents exist for a 40-bit CPU.[5]
A microcoded CPU may be able to present a significantly different instruction set to the application programmer than seems to be directly supported by the hardware used to implement it.
For example, the 68000 presented a 32-bit instruction set to the application programmer -- a 32-bit "add" was a single instruction -- even though internally it was implemented with 16-bit ALUs.
For example, the Zilog Z80, one of the most commonly used CPU families of all time,[6]
presented an 8-bit instruction set to the application programmer --
even though internally it was implemented with a single 4-bit ALU.[7]

For example, w:serial computers, even though they do calculations one bit per clock cycle, present a instruction set that deals with much wider words -- often 12 bits (PDP-8/S; PDP-14), 24 bits (D-17B), or even wider -- 39 bits (Elliott 803).

Notable Homebrew CPUs

The Magic-1 is a CPU with an 8-bit data bus and 16-bit address bus running at about 3.75MHz 4.09 MHz.
[4]

The Mark I FORTH also has a 8-bit data bus and 16-bit address bus, but runs at 1MHz.[8]

The V1648CPU is a CPU with a 16-bit data bus and 48-bit address bus that is currently being designed.
[5]

APOLLO181 is a homemade didactic 4-bit processor made of TTL logics and bipolar memories, based upon the Bugbook® I and II chips, in particular on the 74181 (by Gianluca.G, Italy, May 2012).
[6]

Practically all CPU designs include several 3-state buses -- an "address bus", a "data bus", and various internal buses.

A 3-state bus is functionally the same as a multiplexer.
However, there is no physical part you can point to and say "that is the multiplexer" in a 3-state bus; it's a pattern of activity shared among many parts.
The only reason to use a 3-state bus is when it requires fewer chips or fewer, shorter wires, compared to an equivalent multiplexer arrangement.
When you want to select between very few pieces of data that are close together, and most of that data is stored on a chip that only has 2-state outputs, it may require fewer chips and less wiring to use actual multiplexer chips.
When you want to select between many pieces of data (one of many registers, or one of many memory chips, etc.), or many of the chips holding that data already have 3-state outputs, it usually requires fewer chips to use a 3-state bus (even counting the "extra" 3-state buffer between the bus and each thing that doesn't already have 3-state outputs).

A typical register file connected to a 3-state 16-bit bus on a TTL CPU includes:

Like many historically important commercial computers, many home-brew CPUs use some version of the 74181, the first complete ALU on a single chip.[9]
(Versions of the 74181 include the 74F181, the 40181[citation needed], the 74AS181, the 72LS181, the 74HCT181, etc.).
The 74181 is a 4-bit wide ALU can perform all the traditional add / subtract / decrement operations with or without carry, as well as AND / NAND, OR / NOR, XOR, and shift.

A typical home-brew CPU uses 4 of these 74181 chips to build an ALU that can handle 16 bits at once, like the Data General SuperNova.[10]
The simplest home-brew CPUs have only one ALU, which at different times is used to increment the program counter, do arithmetic on data, do logic operations on data, and calculate addresses from base+offset.

Some people who build TTL CPUs attempt to "save chips" by building that one ALU of less than the largest word size (which is often 16 bits in TTL computers).
For example, the earliest Data General Nova computers used a single 74181 and processed all data 4 bits at a time.[10]
Unfortunately, this adds complexity elsewhere, and may actually increase the total number of chips needed.[11][12][13]

The simplest 16-bit TTL ALU wires the carry-out of each 74181 chip to the carry-in of the next,
creating a ripple-carry adder.

Historically, some version of the look ahead carry generator 74182 was used to speed up "add" and "subtract" to be about the same speed as the other ALU operations.

Historically, some people who built TTL CPUs put two or more independent ALU blocks in a single CPU -- a general-purpose ALU for data calculations, a PC incrementer, an index register incrementer/decrementer, a base+offset address adder, etc.

Quite a few people building "TTL CPUs" use GAL chips (which can be erased and reprogrammed).
[14]
A single GAL20V8 chip can replace a 74181 chip.[15]
Often another GAL chip can replace 2 or 3 other TTL chips.

Other people building "TTL CPUs" find it more magical to build a programmable machine entirely out of discrete non-programmable chips.
Are there any reasonable alternatives to the '181 for building an ALU out of discrete chips?
The Magic-1 uses 74F381s and a 74F382 ALUs;[16]
is there any variant of the '381 and '382 chips that are any easier to find than a '181?
... the 74HC283, 74HCT283, MC14008 chips only add; they don't do AND, NAND, etc. ...

Many commercial machines, such as the Data General Nova 4, used four AM2901 ALUs "in parallel" to build each 16 bit ALU.
Alas, these are apparently even harder to find than the 74181.

One could build the entire CPU -- including the ALU -- out of sufficient quantities of the 74153 multiplexer.[17]

Another designer has posted a 8-bit ALU design that has more functionality than two 74181 chips
-- the 74181 can't shift right --
built from 14 complex TTL chips:
two 74283 4-bit adders, some 4:1 mux, and some 2:1 mux.[19]

The designers of the LM3000 CPU have proven that much of the 74181 is actually unnecessary.
The 8 bit "ALU" in the LM3000 can't actually do any logical operations, only "add" and "subtract",
built from two 74LS283 4-bit adders and a few other chips.
Apparently those "logical" operations aren't really necessary.[20]

The MC14500B Industrial Control Unit has even less functionality than the LM3000 CPU.
It is arguable that the MC14500B has close to the minimum functionality to even be considered a "CPU".[21][22]
The MC14500B is perhaps the most famous "1-bit" CPU.[23][24][25][26][27]

All of the earliest computers and most of the early massive parallel processing machines used a serial ALU, making them "1-bit CPUs".[28]

Solderless breadboards are perhaps the fastest way to build experimental prototypes that involve lots of changes.

For about a decade, every student taking the 6.004 class at MIT was part of a team -- each team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated circuits.[29]
These CPUs were built out of TTL chips plugged into several solderless breadboards connected with lots of 22 AWG (0.33 mm2) solid copper wires.[30]

Traditionally, minicomputers built from TTL chips were constructed with lots of wire-wrap sockets (with long square pins) plugged into perfboard and lots of wire-wrap wire, assembled with a "wire-wrap pencil" or "wire-wrap gun".

There are many ways to categorize CPUs.
Each "way to categorize" represents a design question, and the various categories of that way represent various possible answers to that question that needs to be decided before the CPU implementation can be completed.

One way to categorize CPU that has a large impact on implementation is: "How many memory cycles will I hold one instruction before fetching the next instruction?"

more: some instructions have 2 or more memory cycles between load-instruction memory cycles (memory-memory architecture)

Another way to categorize CPUs is "Will my control lines be controlled by a flexible microprogramming, a fixed control store, or by hard-wired control decoder that directly decodes the instruction?"

The load-store and memory-memory architectures require a "instruction register" (IR).
At the end of every instruction (and after coming out of reset), the next instruction is fetched from memory[PC] and stored into the instruction register, and from then on the information in the instruction register (directly or indirectly) controls everything that goes on in the CPU until the next instruction is stored in the instruction register.

flexible microprogramming that supports the possibility of memory-memory architecture.

Another way to categorize CPUs is "How many sub-states are in a complete clock cycle?"

Many textbooks imply that a CPU has only one clock signal --
a bunch of D flip-flops each hold 1 bit of the current state of the CPU, and
those flip-flops drive that state out their "Q" output.
Those flip-flops always hold their internal state constant, except
at the instant of the rising edge of the one and only clock,
where each flip-flop briefly "glances" at their "D" input and latches the new bit,
and shortly afterwards (when the new bit is different from the old bit)
changes the "Q" output to the new bit.

Single clock signals are nice in theory.
Alas, in practice we can never get the clock signal to every flip-flop precisely simultaneously --
there is always some clock skew (differences in propagation delay).
One way to avoid these timing issues is with a series of different clock signals.[32]
Another way is to use enough power[33]
and carefully design a w: clock distribution network
(perhaps in the form of an w: H tree)
with w: timing analysis
to reduce the clock skew to negligible amounts.

Relay computers are forced to use at least 2 different clock signals, because of the "contact bounce" problem.

Many chips have a single "clock input" pin, giving the illusion that they use a single clock signal -- but internally a "clock generator" circuit converts that single external clock to the multiple clock signals used by the chip.

Many historically and commercially important CPUs have many sub-states in a complete clock cycle, with two or more "non-overlapping clock signals".
Most MOS ICs used dual clock signals (a two-phase clock) in the 1970s[34]

Building a CPU from individual chips and wires takes a person a long time.
So many people take various shortcuts to reduce the amount of stuff that needs to be connected, and the amount of wiring they need to do.

3-state bus rather than 2-state bus often requires fewer and shorter connections.

Rather than general-purpose registers that can be used (at different times) to drive the data bus (during STORE) or the address bus (during indexed LOAD), sometimes it requires less hardware to have separate address registers and data registers and other special-purpose registers.

If the software guy insists on general-purpose registers that can be used (at different times) to drive the data bus (during STORE) or the address bus (during indexed LOAD), it may require less hardware to emulate them: have all programmer-visible registers drive only one internal microarchitectural bus, and (at different times) load the microarchitectural registers MAR and MDR from that internal bus, and later drive the external address bus from MAR and the external data bus from MDR. This sacrifices a little speed and requires more microcode to make it easier to build.

Rather than 32-bit or 64-bit address and data registers, it usually requires less hardware to have 8-bit data registers (occasionally combining 2 of them to get a 16-bit address register).

If the software guy insists on 16-bit or 32-bit or 64-bit data registers and ALU operations, it may require less hardware to emulate them: use multiple narrow micro-architectural registers to store each programmer-visible register, and feed 1 or 4 or 8 or 16 bits at a time through a narrow bus to the ALU to get the partial result each cycle, or to sub-sections of the wide MAR or MDR. This sacrifices a little speed (and adds complexity elsewhere) to make the bus easier to build. (See: 68000, as mentioned above)

Rather than many registers, it usually requires less hardware to have fewer registers.

If the software guy insists on many registers, it may require less hardware to emulate some of them (like some proposed MMIX implementations) or perhaps all of them (like some PDP computers): use reserved locations in RAM to store most or all programmer-visible registers, and load them as needed. This sacrifices speed to make the CPU easier to build. Alas, it seems impossible to eliminate all registers -- even if you put all programmer-visible registers in RAM, it seems that you still need a few micro-architectural registers: IR (instruction register), MAR (memory address register), MDR (memory data register), and ... what else?

Harvard architecture usually requires less hardware than Princeton architecture. This is one of the few ways to make the CPU simpler to build *and* go faster.

The simplest kinds of CPU control logic use the Harvard architecture, rather than Princeton architecture.
However, Harvard architecture requires 2 separate storage units -- the program memory and the data memory.
Some Harvard architecture machines, such as "Mark's TTL microprocessor", don't even have an instruction register -- in those machines, the address in the program counter is always applied to the program memory, and the data coming out of the program memory directly controls everything that goes on in the CPU until the program counter changes.
Alas, Harvard architecture makes storing new programs into the program memory a bit tricky.

↑
"Touch the magic. By this I meant to gain a deeper understanding of how computers work" -- Bill Buzbee
[1]

↑
"To evaluate the 6800 architecture while the chip was being designed, Jeff's team built an equivalent circuit using 451 small scale TTL ICs on five 10 by 10 inch (25 by 25 cm) circuit boards. Later they reduced this to 114 ICs on one board by using ROMs and MSI logic devices." -- w:Motorola_6800#Development_team

↑
"The 74181 is a bit slice arithmetic logic unit (ALU)... The first complete ALU on a single chip ... Many computer CPUs and subsystems were based on the '181, including ... the ... PDP-11 - Most popular minicomputer of all time" --
Wikipedia:74181

↑
"The basic algorithm executed by the instruction execution unit is most easily expressed if a memory address fits exactly in a word." --
"The Ultimate RISC"
by Douglas W. Jones

↑
"it just really sucks if the largest datum you can manipulate is smaller than your address size. This means that the accumulator needs to be the same size as the PC -- 16-bits." --
"Computer Architecture"

relay computers by Kilian Leonhardt (in German): a "large computer" with around 1500 relays and a program EEPROM, and a "small computer" with 171 relays.

DUO 14 PREMIUM by Jack Eisenmann (around 50 relays, including 4 addressable "crumbs" of RAM where each crumb is 2 bits, plus 48 bits of program ROM in 6x8-switch DIP switches. The only semiconductor components: 555 timer, decade counter, and transistors in the clock generator. Each command has 6 bits, and the 8 commands in the program ROM are selected by a 3-bit program counter).

Wikipedia: Z3 (computer), designed by Konrad Zuse, the world's first working programmable, fully automatic computing machine. built with 2,000 relays.

Rory Mangles. Tim 7: A 4-bit relay CPU with the program stored on punch tape

Rory Mangles. Tim 8: "one of the smallest Turing complete relay computers in the world by relay count" an 8-bit relay CPU with the program stored on punch tape, data stored in discrete capacitors (!) (no RAM chips) with one relay pole per byte; uses 152 relays, most of them single-pole.

James Newman. The Mega-processor. "put LEDs on everything so we can actually SEE the data moving and the logic happening." No integrated circuit; only LEDs and resistors and about 14,000 (?) discrete transistors (2N7000 in through-hole TO-95 package ?) in the functional part of the CPU.

The Q1 Computer by Joe Wingbermuehle. Built almost entirely out of (3105) individual through-hole PN2222A transistors. "Clock phases are used so that transparent latches can be used for registers to reduce transistor count at the price of speed." 8 bit data bus, 16 bit address bus.

Svarichevsky Mikhail is apparently building a processor entirely out of discrete transistors. Using very careful analog tuning (12 resistors of various values), Svarichevsky Mikhail has developed a 4 transistor full adder: "BARSFA - 4-TRANSISTOR FULL ADDER". (Are the 4 Schottky diodes -- across the base and collector of each transistor -- really necessary, or just to improve performance?) (He also shows a canonical implementation of a CMOS full adder, requiring 28 transistors).

Mark's TTL microprocessor (uses only 8 chips ... "Without using the two PALs I used, it would be 16 chips.") (is there a better URL for this?)

DUO Compact by Jack Eisenmann: The DUO Compact CPU was built out of 22 integrated circuit chips, including 2 EEPROMS for microcode and 1 EEPROM for boot ROM. It has some nice features -- a unified address space (16 bit address bus, 8 bit data bus); programs can run out of the boot ROM or the data RAM; memory-mapped I/O; etc. Also some odd features -- the instruction pointer is reloaded to a literal "next" value in every instruction -- it's not really a "program counter", because the CPU lacks the hardware to "count" or "increment" a value directly.

Galactic 4 bit CPU by Jon Qualey. Two, 2716 EPROMs are used to store the micro-instruction code and two, 2114 static RAMs are used for program memory. 25 ICs in all, 74LS TTL.

LM3000 CPU designed and built by five students at Bennington College, Vermont, using fifty-three integrated circuits.

The D16/M by John Doran is a 16-bit digital computer implemented with SSI and MSI HCMOS integrated logic and constructed using wire-wrap techniques. Its timing and control unit is microprogrammed (fully horizontal, with a 72-bit control word).

(FIXME: who?) has built a MC14500 clone out of (TTL) discrete logic.[1] (FIXME: who else?) has built a MC14500 clone on a FPGA.[2]

TANACOM-1 by Rituo Tanaka is a 16-bit TTL minicomputer built with a total of 146 ICs, including 4 SN74181s and a 74182 in the ALU.

BMOW 1 (Big Mess o' Wires) by Steve Chamberlin is an 8 bit CPU built from discrete 7400-series logic, and a few 22V10 and 20V8 GALs. All the digital electronics on a single large Augat wire-wrap board to interconnect the 50 or so chips. BMOW 1 contains roughly 1250 wires connecting the components. All data busses are 8 bit; the address bus is 24 bit. 3 parallel microcode ROMs generate the 24 bit microcode word. VGA video output is 512×480 with two colors, or 128×240 with 256 colors. The microcode emulates a 6502 (more or less). Uses two 4-bit 74LS181s to form the core 8 bit ALU.

The MyCPU - Project[10][11]: "everybody is invited to participate and contribute to the project." The CPU is built from 65 integrated circuits on 5 boards. 1 MByte bank switched RAM. Originally developed by Dennis Kuschel. Apparently several MyCPU systems have been built? One MyCPU system runs a HTTP web server; another MyCPU system runs a (text-only) web browser).

Alas, several of the parts used by the original SAP-1 are apparently no longer being manufactured. Pong Guy describes a few replacements that allow you to build a SAP-1 using parts that are still being manufactured.[13][14]

Pong Guy discovered that replacing the SAP-1 hard-wired "control matrix" with a microcode ROM reduces the total number of ICs from 48 to 35.[15]

Pong Guy. "ASAP-3 - Almost Simple As Possible Computer 3". Pong Guy designed an 8-bit computer, from 55 discrete TTL logic chips (including the RAM and program ROM chips). ASAP-3 schematics. Pong Guy also designed a nicely laid-out PCB to hold those 55 chips, the oscillator, a 10-digit LED display + 2-line LCD display, a 22-button keyboard and some toggle switches. ASAP-3 emulates much of the 8085 instruction set. Runs at over 500kHz. Inspired by the "SAP-1 Simple As Possible microprocessor".

John Peterson. "Wire-wrap Days". describes an "intense" hardware lab class building a working computer with serial I/O, lights and switches front panel, and 8-bit CPU entirely from 74Cxx chips, some 2716 UV erasable ROMs, and some SRAM. (The architecture is very close to the a PDP-8, using the URBUS bus similar to the PDP-11 Unibus).

the Tandem/16 (NonStop I) was initially implemented by standard low-density TTL chips. It has high-availability features that are still relevant today.[3]

(What ALU did this Tandem/16 use? the 74181?)

the TS-1, the first commercial implementation of the HP PA-RISC architecture, was built from discrete 74F TTL chips. The 32-bit CPU was spread over 6 boards with about 150 chips on each 8.4" x 11.3" board. Those six TS-1 boards implement the processor, a 4096-entry TLB, 64 KB (L1) instruction cache, and 64 KB (L1) data cache.[4]