Wednesday, May 31, 2017

My guiding principles, both separately and in conjunction, present some interesting challenges. Here are the optimization puzzles I'm currently working on.

Minimizing Clock Per Instruction

Modifying a value in RAM involves moving the data from RAM to the Data Register, in/decrementing the Data Register, and moving the data from the Data Register back to RAM. Since Brainfuck programs usually have several +/- in a row, the data only needs to be transferred at the beginning and end of a run.

To state more clearly, you only need to write to RAM when the Data Register has been incremented/decremented and the memory location is about to change. Similarly, you only need to read from RAM if the memory location has changed since the previous read. The story gets a bit more complicated with the I/O instructions, but that's for another day.

As an alternative to the conditional read/write, I've been toying with the idea of putting the read/writes and inc/decs out of phase. So the RAM would be read/written on rising clock, and inc/dec would happen on falling clock. Coincidentally (or more likely by foresight of the 7400 series designers), the control signals to the RAM and counters already follow this logic. I haven't followed this line of thinking very far though so I'm not sure where it will lead.

Zeroing RAM

When a Brainfuck program begins, all memory locations have to start at 0. One option would be to keep a "Clear Pointer" of the highest accessed address. Whenever the Data Pointer goes up, if it is higher than the Clear Pointer, write a 0 to that location. With some clever micro-instruction logic, this would only take 2 extra cycles the first time you access a RAM location.

A simpler (and somewhat less fun) option would be to have the reset circuitry walk through RAM with a fast clock and set it all to 0 before running the program. I feel like this better fits the "Minimal number of execution steps per instruction" principle, and more directly reflects the execution of an interpreted/complied Brainfuck program on a regular computer.

Fixing a Major Brainfuck

If you've been paying really close attention so far and, and if you have an intimate working knowledge of Brainfuck, you may have noticed some hand waiving in my early posts about constant time looping. My logisim implementation had a 'uge mistake in its implementation of the Brainfuck language, and I didn't notice until long after I'd started on the breadboard project.

The loop construct, [ ... ], is represented in C as while (*data_pointer) { ... }. This means that as the loop is entered, the current RAM value is read. If it's 0, the contents of the loop are skipped. Otherwise, the contents of the loop run. At the end of the loop, if the contents of RAM at the current location is NOT zero, then it jumps back to the beginning of the loop.

Unfortunately, the way I built it in logisim would be represented by do { ... } while (*data_pointer);. This means the contents of the loop always run at least once. An open question is whether this change breaks the language's Turing Completeness. Another open question is what do we call this new language? But those questions are for another time.

Does this problem mean I have to sacrifice [ to the O(n) gods? Not necessarily... If I'm planning to do the "zero RAM on reset" strategy, I could throw in a parallel "index all the loop locations". I would have a bit of circuitry walk through program memory looking for all the [ and ]. In another RAM, at each address where there is a [ in the program, write the address of the corresponding ]. This reset circuitry could use the existing Stack RAM to balance the brackets.

So that's where I am as of this moment. The replacement breadboards should be here today. Yay!

Edit

On second thought, the index could be used for both [ and ], and the stack would only be necessary during analysis. That would make the control circuitry more consistent.

Tuesday, May 30, 2017

To breathe life into my CPU, it needs a heartbeat. I put together a simple 555 astable circuit with a 10k potentiometer to adjust the clock speed. I'm still waiting on some switches to put in a manual clock mode.

Since the counter control signals need to be synchronized with the clock, I added a 74LS00 (quad 2-input NAND) to combine the signals. This provides the high signal by default to disable count/load, and only activates (goes low) when both the clock and control signals go high. This "uninverts" the logic of the control signals and confuses me a couple times in the following video:

You'll notice a couple extra jumpers connecting the ground lines on the data register. I'm starting to have some really frustrating issues with these crappy breadboards. Replacements should be here tomorrow.

The LED boards are made from some breadboard-style PCB that came with one of the Elegoo kits. I trimmed the plastic lip with wire nippers so they'd fit snugly. I alternated their orientations so cathode is next to cathode, anode next to anode. Then I put resistors on the underside of the board between the cathodes and the power rail of the PCB. I chopped off the cathode leads, and added a ground connection to the new common cathode. Having 2 anodes side by side with 2 blank spaces between fits perfectly in the outputs of the 193s, and only needs a small bit of bending to fit into the data bus.

I've got a 16-bit one for the data pointer but still need to dremel it down to size.

After making a pile of insulation confetti, here's a thing that actually does some stuff:

I went ahead and grabbed one of the other 193s to finish up the Data Pointer at the top-left. The output of the counters are connected in purple to the address lines of the Data RAM.

The buttons on the left are connected to Load (orange), Clear (white), Down (red/blue) and Up (green/yellow). This allows me to easily cycle through memory addresses. Load, Down, & Up are active low, so they've got pull-up resistors to +5V. Clear is pulled low.

The I/O pins of the RAM are connected to the data bus. The three jumper wires coming out of the ram are Load (orange), Output Enable (blue), and Chip Enable (yellow). These are all active low, so currently it's sending whatever value is being pointed to in RAM onto the bus. The LEDs show this as 11001000 (0xC8).

The Data Register has grown quite a bit more complicated. The chips with wires going to the bus are 74LS244 octal tri-state buffers. These separately connect/disconnect the inputs (blue) and outputs (yellow) of the counters to/from the data bus. If I were using CMOS chips, I would leave the inputs connected and only switch the outputs. But with TTL, inputs draw non-trivial current and I don't want to load the bus any more than necessary; breadboards already have a hard enough time delivering power.

The 193s have the same jumper wire colors for clear, load, down, and up as the data pointer but are connected directly to the power rails rather than through buttons. At this point I'm more concerned with moving data between RAM and the register. I can do that in this shot by moving the blue jumper to ground on the data register (input from bus), then momentarily bringing the orange jumper to ground (load).

Once the value is loaded into the register it can be modified with the up/down jumpers (albeit with tons of bounce). Then I can load it back into RAM by turning off the output from RAM (set its blue jumper high), enabling output from the data register (set its yellow pin low), and momentarily set the RAM's load pin low.

The most basic piece of a breadboard computer is, of course, the breadboards. I found some Elegoo brand 3-packs on Amazon with primarily positive reviews for less than $10 and snagged a couple, along with some other Elegoo kits which each came with one breadboard.

I was not impressed at all with their quality. The clips were misaligned so that inserting leads was tedious-to-impossible. Their own brand power modules, which have a DC jack and plug into the power rails on a breadboard, would not fit without some extreme jiggling & pin bending.

I left a one-star review and they sent me replacements hoping I'd reconsider. At first they seemed much better than the original set. But while they didn't have the same frequency of issues, they still had problems. For one thing, the DIP switches I got will not stay in the holes at all. Tactile switches are similarly hard to keep in. By comparison, a breadboard that came with an electronics kit years ago holds onto these with enough force that they were more difficult to remove than insert.

I ordered some better ones which should be here this week, but these ones work just well enough that I couldn't resist starting. Here is an initial layout:

All but two of the breadboards had their "top" power rail sliced off (the slicing is just through the double-sided tape; they're otherwise joined by plastic tabs). The boards and power rails were joined together in two chunks each. I cut some pieces of hookup wire to connect the power & data buses together.

The clock is a 555 timer. The second 555 is for de-bouncing the manual clock button. I'm planning to use a different design that only needs a single 555, but I need a SPDT switch, not just an OFF-(ON).

Data Pointer and Data Register/ALU are made of Binary up/down counters (74LS193). This naturally represents the fundamental Brainfuck operations of +, -, >, and <. The Data Pointer pictured here has only 3 of the 4 chips needed to address 32 K of RAM. A handful more of these chips should be arriving this week.

A + instruction will cause the Data Register to increment. A - will cause it to decrement. A > will cause the Data Pointer to increment. A < will cause it to decrement. Values will be transferred to/from RAM whenever necessary.

Stack Pointer is also made of 193s to handle Brainfuck's [ and ] instructions. The location to jump to will be stored when entering the loop. An alternative design would be to keep a depth counter and decrement PC until depth goes down, but looping should be constant time for remotely reasonable performance.

The Program Pointer also uses 193s; It never needs to count down, but the other binary counters in the kit are ripple carry and can't be loaded directly with data. It needs to be able to load a program counter location from the Stack Pointer.

Data and Stack RAM are made of 32 KB ram chips (CY7C199-35PC). Stack needs 2 of them to handle the address width of the Data RAM. The two chips will have address lines in common, so it acts as a single 32 K x 16-bit memory.

The microcode stepper is a J/K flip flop to toggle between micro-instruction steps.

The Program & Microcode ROMs are 32 KB EEPROMs (X28C256P-15). This is massive overkill for the Microcode. I shouldn't need more than 8 bits per instruction and Brainfuck only has 8 instructions. I may end up doing the microcode with discrete logic gates just for the challenge.

A couple months ago I stumbled across Ben Eater's series building a breadboard computer out of TTL chips. I've had the idea in the back of my mind to do something like this for a while now, but Ben's series has kicked a lot of people over the edge and it's turning into a bit of a trend.

I've always had a soft spot for Brainfuck (and more generally, esolangs and obfuscated code). Way back in ought 6, I wrote an obfuscated Brainfuck interpreter to use as my e-mail signature (the yin-yang light bulb was my personal logo before Gnomes took over my identity):

One-to-one correspondence between Brainfuck commands and CPU instructions - There is merely an encoding difference

Constant time execution of each instruction - It should never take more than 3 cycles to execute any instruction

Minimal number of execution steps - If an instruction can be executed in one cycle, it should be

The RAM needs are behind most of my departures from Ben's design. He's using a pair 74189s (or equivalent) which are 16 x 4 bit each, for a total of 16 bytes of RAM. His computer uses a Von Neumann architecture, so his RAM is used for both code and data, and all data transfers happen over a single bus. This makes his computer much closer in design to the CPUs inside desktop computers, smartphones, etc.

Brainfuck programs are extremely inefficient in code size, and fairly inefficient in memory usage. Only the most trivial programs would fit in 16 bytes (nothing close to printing the Fibonacci sequence). Since I'm targeting minimal clock cycles, a Harvard architecture makes more sense, where data and instructions are transferred on separate buses. Since Brainfuck programs are so inefficient in code size, it would be extremely tedious to have to enter the program every time the system powers on. I decided to use EEPROMs for the program memory.

I spent some time playing with logisim-evolution, a digital circuit designer/simulator to get a rough idea of the implementation. By using RAM with asynchronous outputs (output changes immediately when the address changes) and separate inputs, I was able to make a design that executed one instruction per cycle.

Then I started looking at hardware. Jameco sells a kit with 10 or 20 each of 35 different 74LS chips, which I believe is the same kit Ben used. Except for the RAM and ROM, this kit has pretty much everything I needed. The most important parts are the binary up/down counters (74LS193) since half of all Brainfuck operations involve either incrementing or decrementing a number (either the location in memory or the data in memory). They also had a suitable EEPROM. So that side of things is fine, but I still needed RAM.

Even if I used every flip flop & latch in the kit, I would still only have probably a hundred or so bytes of RAM, so I went looking for SRAM chips with 15+ address lines. Unfortunately, the only asynchronous SRAM chips available with separate input & output lines are fully dual port, meaning both sets of I/O lines can act as input or output. This makes them much more complicated in design and the cost for a 32KB chip is north of $40. On top of the cost, they're all surface mount devices (BGA, QFP, etc), so I'd have to solder them onto a break-out board. I've never done surface mount soldering, and there's no way I'm going to practice on a $40 chip.

So I decided to scale back my ambitions and go for a 2-cycle design using single port RAM. I was able to get ten of CY7C199-35PC for half the price of a single dual-port.

In the next post, I'll show the initial breadboard layout and give an overview of the design of the computer.