We have finally come to the big one, the new I/O card. This is the one I designed completely from scratch without influence of the half-assed old I/O card, so I’m going to go in details in this article.

So far, we have the backplane, CPU, memory, and video card. I can program FAP to display some messages on the screen but there is no way to do anything else because no input methods are available, and as a result FAP can’t communicate with the outside world. It needs some way of input/output and that’s why I’m designing the I/O card.

A quick recap of the the previous article about FAP I/O: Z80 uses in/out instruction for port read/write. 256 ports are available and port address is the lower 8 bit of the address bus when in/out is executed. A port is read when both IORQ and RD signal goes low, and written when both IORQ and WR goes low. I’m also using interrupt instead of polling, Z80 has 3 interrupt modes, mode 0 is 8080 compatible mode that no one uses, mode 1 just jumps to 0x38, and mode 2 makes your head explode the first time you learn about it but is the most powerful, and is what my I/O board going to use.

The last time I worked on the I/O board I used an arduino just for reading PS/2 keyboard, an STM32F103 as the main I/O interrupt controller, and a bunch of 74 chips for the interface between STM32 and Z80 bus. The slap-together had 2 I/O ports and 2 interrupt vectors, but even with just that it accumulated 9 74 series chips, just look at this mess:

Notice the piggybacking everywhere and the general messiness, if I were to continue to design the new I/O board this way it will end up a horrendous board to route, no flexibility at all, high power consumption, possible noise problems, and one single bug means making a entire new board.

So instead I’m going to do it properly and go the modern route: Using a CPLD. CPLD stands for Complex Programmable Logic Device, like FPGA it has logic that can be configured via HDL, but unlike FPGA it has a built-in non-volatile configuration memory so it works right away when powered up. CPLD is also cheaper and less complex than FPGA, often only have hundreds of gate-equivalents instead of millions in the FPGA, as a result CPLD is often used as glue logic instead of an active device. Personally I feel that FPGA is mainly for high speed high bandwidth active applications, and CPLD is basically a replacement for 74 series chips.

And replace 74 series chips it will. The CPLD I picked is Altera EPM570. It’s a slightly older part, but has plenty of pins and is cheap, there is a even cheaper version EPM240, exactly same but with less logic cells. Because EPM570 has 144 pins, the plan is just connect everything on the bus to the CPLD and figure it out in HDL. As for the I/O controller, I’m using a STM32F051C8T6 this time.

I went all out on the STM32 side, adding as much peripherals as possible, in the end I have SD card, I2C EEPROM, PS/2 Keyboard, RTC, ESP8266, and a general purpose UART header all hooked up to the STM32. It might be a bit overkill but I guess it’s better to have them just in case than not having them at all.

The STM32 will talk to the CPLD via a mini-bus with 4 bit of address, 8 bit of data, and a handful of control signals. While the CPLD contains all the glue logic to interact with the Z80 bus. Below is the diagram of the I/O board structure.

To program the CPLD I bought a cheap chinese knockoff programmer which seems work well enough, also needed is Altera Quartus Prime Lite, which is free. I’m going to use schematic capture for this instead of connecting everything in Verilog because I feel that being able to see and edit a schematic is more intuitive in this case. But first I need to write a couple of components that I’m going to use inside CPLD, chief among which is the 74HC573 8-bit transparent latch, a simple matter of 30 lines of code:

Another one we’re going to use is a 4-to-16 line decoder:

And with that, we can start laying out our glue logic inside the CPLD. Instead laying down physical 74 chips, we can just do it in software and watch the magic happen. This I/O board needs to handle interrupts, port write, and port read. So let’s start with the first one:

FAP I/O board Z80 interrupt logic

While it might look complicated at first, this is actually not that bad. The centrepiece is just a 8-bit latch. When the STM32 controller wants to start a Z80 interrupt, it first puts the interrupt vector on the STM32-CPLD data bus, then activates INTVECT_LOAD signal, this loads the vector into the latch. Then STM32 pulls down the INT line on the Z80 to start the interrupt, the Z80 will acknowledge the interrupt at the beginning of the next instruction, where both IORQ and M1 goes low. They are OR’ed together to give INTACK, which then activates the output enable of the latch, putting stored interrupt vector onto the CPU bus, which Z80 then combines with I register and jumps to that interrupt vector address. The INTACK also generates an interrupt on the STM32, upon which deactivates the INT line. Here is the STM32’s code snippet:

With that out of the way, next up is port write. It uses 2 transparent latches and it’s actually extremely similar as the interrupt above, only the data is loaded from Z80 side. When Z80 wants to write to a port, both IORW and WR goes low, which is OR’ed to give IOWR signal. When IOWR is low, there is valid port address and data on the CPU bus, so I made IOWR simply load the address and data into the corresponding latches. IOWR also fires an interrupt on the STM32, who will then activate the latch1 signal and read the data and address off the latches and process them. Below is the diagram and code snippet.

FAP I/O board port write logic

Next up is port read, this is the most complex of the three because of the timing constraints. Let’s look back at port write first: when CPU does a port write, the port address and data are loaded into two transparent latches, while STM32 is notified at the same time. So even if STM32 bogged down for some reason the port address and data will still be available in the latch, nothing is lost.

There is no such luxury in the case of port read, when Z80 wants to read a port, both IORQ and RD goes down, OR’ed together to give IORD. Port data must be valid the moment IORD goes active, otherwise the CPU gets garbage data. That means I can’t make STM32 respond to IORD as an interrupt since it would already be too late. So as you probably have guessed, transparent latches to the rescue again.

FAP I/O board port read logic

This is much more complicated than the first two so bear with me: There are 16 transparent latches, corresponding to 16 available Z80 ports. When Z80 wants to read one of these ports, a valid port address will be present on the CPU bus, then the IORD signal goes active. When that happens a 4-to-16 decoder is activated and selects the corresponding latch out of the 16 based on the address on the CPU bus, this select signal along with IORD itself enables the output of that latch, putting its content onto the data bus instantly.

Of course we need be able to write data into those latches first for CPU to read. When STM32 wants to load data into a latch, it puts the address and data onto the STM32-CPLD bus, then activates the LATCH16 signal. This enables another 4-to-16 decoder inside CPLD that select the corresponding latch. The select signal together with LATCH16 signal loads the data into said latch, which will then be available for CPU to read. The STM32 code is actually pretty simple:

With all those out of the way, here is the finished board:

Again, I made some mistakes in the rush to get the board made, the footprint of PS/2 connector is flipped, the JTAG connector pinout is all wrong, and somehow I missed the TDO line. Those are all fixed in the repo. And with just one jumper wire, the board functions perfectly.

With backplane finished, next step is to design a number of modules that plug into it. It would be the same as the old FAP, consisting of CPU board, memory board, video card, and I/O board. In this post I’ll get the first three out of the way, since they are have not changed much.

And because they are still largely the same, albeit laid out nicely on a PCB, I’m not going into details about how they work in this article, check out the older entries for those details.

One thing that did change was the bus voltage. Originally the bus runs at 5V since both Z80 and memory operates on that voltage. However, the FPGA video card only tolerates 3.3V, so was the CPLD that I’m planning to use for the I/O board. So in the spirit of doing things properly, the new FAP’s bus will be running on 3.3V. How are we going to achieve that? Take a look at Z80’s data sheet:

Looks like inputs are at TTL level, meaning they only needs 2.2V to be registered as high, so 3.3V inputs should work. However, for some reason clock inputs still needs at least 4.4V, so that will need level translation. To shift signal level up, power the 74 chip at 5V and give it the input at 3.3V, buffered 5V signal will come out the other end. Similarly, to shift down power the chip at 3.3V and give it 5V inputs, 3.3V signal will appear on the output.

As before I used 5 74HC245 bidirectional bus transceiver as both level translator and buffer. Every single signal is broken out on the bus, buffered, and all at 3.3V. The jumper wire you see on the picture below is to correct a small error in the first revision of the PCB, the latest board file in the project GitHub already had it fixed.

CPU Control inputs such as CLK, INT, NMI, WAIT, BUSREQ, RESET needs to be pulled up as they need to be at defined state at all times. I also added LEDs for power and BUSACK. Decoupling capacitors are used liberally, since I was in so much trouble with noise last time.

Next up is memory board. I’m actually planning to design a memory board with a modern flash memory for both ROM and RAM, and a CPLD as memory controller, but for now I’m sticking to the old plan: 32KB ROM from 0x0000 to 0x7fff, and 16KB RAM from 0xC000 to 0xffff. The missing 16KB at 0x8000 is used for VRAM.

In the old FAP I piggybacked two memory chips so it needed less wiring, this time I again did it properly and put them side by side. Those two chips runs at 5V but takes 3.3V inputs as well, so only one output buffer-shifter is needed. I also changed the the write enable signal of the EEPROM by ORing it with BUSACK, so that EEPROM write is only possible when CPU is off the bus, that means misbehaving programs will not be able to write into ROM during normal execution. Another couple of LEDs round up the board.

Funny thing happened when I tested the newly assembled memory board, as writing into addresses beyond 0x2000 will change the data starting in 0x0000, I thought there was some short in the address line, but in the end it turned out my eBay-bought AT28C256 32KB EEPROM chip is a fake, and it’s actually a AT28C64 8KB part. They actually spent the effort sanding off the original marking and laser engraved a new one. I ordered some more from mouser, and I guess I’ll have to make do with 8KB for now, which is still plenty.

This one is pretty straightforwards as well, I’m still using the Mojo V3 FPGA board piggybacking on the circuit board, and two SRAM chips for double buffered video memory. 6-bit R-2R DAC produces 64 colors, and a nice VGA connector instead of just wires.

Being the board with the highest speed signals, 25MHz pixel clock, the improvement in video quality is night and day compared to the old hand-wired board. The whole display is much sharper than before, and ghosting and color desync is gone.

There are still some issues though, in the rush to get the board made I didn’t realise the VGA connector actually blocked the USB port on the Mojo, so I had to either take the Mojo off to program it or remove the VGA connector, I did the latter. This has been fixed in the project repo. The other thing was I ran out of pins on the Mojo, so I had to omit some signals that I would otherwise like to include for more functionality. I saved 2 pins on the VRAM address line, halving VRAM from 32KB to 16KB, I also had to give up the maskable interrupt interface and use NMI instead. When I have time I’m going to design FAP’s video card with on-board FPGA instead of using Mojo, but for now it it’s still pretty capable.

With the boring stuff finally out of the way. I/O board is next, and it’s going to be a long one.

Can’t believe it has been more than 9 months since my last update. Work picked up, and although I was still working on FAP on and off, I just didn’t bother update the blog until I made some progress.

Last time I was here I just finished the I/O board. The mode 2 interrupt and port read was working, keyboard was working, so did the serial input. So I started coding some simple print routines in assembly, stuff like putc and puts. It quickly turned out that my programs would only work intermittently, sometimes goes into NMI for no reason, sometimes resetting by itself, sometimes jumps to the wrong address. And faster the clock speed, more often those happen. It was getting extremely frustrating since when it goes haywire I wouldn’t know if it’s a bug in my program or just FAP being unhappy.

As you probably have guessed, It was the noise problem from having no ground planes, hundreds of wires bunched together cross talking like crazy, and maybe even one or two cold solder joints. I thought I could get away with it but I should have known better, just look at this spaghetti shitshow running at 8MHz, it’s amazing it worked at all.

Clearly, in order for FAP to work, I had to ditch hand-assembly and go full PCB. This way, it will look much more professional, and I don’t have to cut, strip and solder miles of red wires. That’s exactly what I did. And now, after 9 months of hiatus, FAP rises again, stronger than ever.

First things first, I need to design a brand new backplane. The original one is basically a stripboard with 4 dozens wires soldered from an STM32 dev board. Since I’m doing it again, I’m going to do the whole thing properly. The uC used on the old FAP was an STM32F103VCT6, 3 problem with it: it’s a 100pin part which is too much for this application, it’s a relatively old member of the STM32 family, missing a lot of nice features, and there is a 32KB code size limit in uVision5. So back to the parts pin it goes, instead I’ll using STM32F072R8T6. It’s a F0 so no code size limit in uVision, it has a lot of new features that F1 doesn’t (32-bit timer, build-in USB pull-up, swappable RX/TX, just to name a few), it’s a 64pin so it’s easier to solder, and in the end it’s cheaper too.

Now comes the problem with the size of the backplane. The board size limit of EAGLE free version is only 8cm x 10cm so that won’t do. Luckily EAGLE offers a educational version, free as well if you have a .edu email, that has a 10cm x 16cm limit. That’s what I ordered and as a result 10cm x 16cm will be the dimension of the FAP backplane.

Next part to reconsider is the bus connector, in the old FAP it was a double row female pin header with 2 rows of 40 pins, so in theory it should have 80 signals. But because it was on a strip board the two rows were connected, only 40 signals were actually available. And because I had to cram most of the CPU signals on there, few control signals had to be omitted and only one pin was used for GND, resulting in the non-existent noise immunity. This time I decided to still use double row female pin header, this time 38 pins wide because of the board size. But because it’s on PCB I have all 76 signals so every single Z80 signal is on the bus this time. Here is the pinout of FAP’s new bus connector.

As you can see most of the first row is GND to reduce noise, the clock signal is surrounded by 3V3 to for that reason too. The signals is arranged in the form of control outputs, control inputs, data, and address.

With those problems out of the way, here is the finished design of FAP’s new backplane:

5 bus connecters spaced 2cm apart, the microcontroller on the right side of the board, a USB connector provides both power and communication, and a 3.3V regulator and 6 buttons round up the design. I’m also putting the LCD above the uC instead of letting it dangle off the side of board.

Here is the assembled board, much neater than the old one, and hopefully a lot less noisy too.

The backplane firmware needed an update too. On the old backplane communication was via a serial port, this time we’re using USB, which is much faster. But overall not much has changed. You can find the up-to-date resources on the Github repo of this project.

Now that FAP’s video card is finished, it’s time to move on. Although we’ve come so far, FAP is still missing something crucial in order to be called a proper computer, it still needs at least one input device. That will be the keyboard. More specifically, a PS/2 keyboard.

Using PS/2 keyboard for retro computers is nothing new thanks to its simple protocol and wide availability. It uses a simple synchronous serial interface, one clock line and one data line, and the documentation is all over the internet so I’m not going to bother to explain here again. Actually I didn’t bother to read much about it at all because of a reason you’ll see soon enough.

Before we start working on keyboard though, we need to first figure out some way for keyboard to talk to the CPU. Generally there are two means of communicating with peripherals, memory mapped I/O or port I/O. In memory-mapped I/O peripherals’ registers are mapped to the memory address space, and CPU writes or reads its memory address to control the peripheral. I used this method for FAP’s video card. For the keyboard I decided to use port I/O, since it’s usually used with external peripherals, and I feel it’s a good exercise to try a little bit of everything with Z80, which is why I’m doing this in the first place. The Z80 supports 256 ports, when the user calls in or out instruction it places the lower 8 bit of the register on to the address bus, and activates IORQ and one of the RD or WR lines to read or write a byte from that port.

With the I/O method sorted, next issue is how to let CPU know the keyboard events. One way is polling, in which the CPU constantly asks keyboard if it has something. This approach is simple but extremely wasteful, since CPU will spent most of its time reading from keyboard instead of doing actual work. A much better way is using interrupts. As its name suggests, when there is actually data from keyboard, the CPU will be interrupted from its work, it then can store the keystroke into a buffer and do something with it later. This way the CPU does not waste anytime polling the keyboard, and only acts when there is an actual key press, and that is what I’m going to use today.

The Z80 has a (seen from today)somewhat rudimentary but still rather clever interrupt system. There are two interrupt pins on the package, INT and NMI, the former is for regular interrupt that can be disabled in software, the latter is non-maskable interrupt and always occurs when the line goes active. For the regular interrupt, 3 interrupt modes are provided, mode 0 is a 8080-compatible mode where when interrupted, you have to put some instruction on the data bus for the CPU to execute. It’s basically witchcraft so I’m not going to use it. Mode 1 is much simpler, it just jumps to address 0x38 when interrupted. This makes designing simple systems very easy since you can just put your interrupt handler there, or a jump instruction to jump to somewhere else if you need more space. However, if in mode 1 there are more than one interrupting devices, the CPU will have to figure out who is the one that initialized the interrupt, which gets complicated real fast.

That’s where interrupt mode 2 comes into play. I’m going to list how mode 2 works off the top of my head since I worked with it for so long, it’s going to be a mouthful and I hope I get it right: When interrupt fires, the interrupting device puts a byte onto the data bus, the CPU then combines the 8 bit in the interrupt vector base register and the 8 bit on the data bus to form a 16 bit address, jumps to that, read a 16-bit word at that address, which is the interrupt service routine address, then make another jump to that. For example, if I load interrupt register with 0x12, and put 0x10 on the bus after interrupting the CPU, the CPU will jump to 0x1210. If I put data 0x3000 there beforehand, the CPU will then read a word from 0x1210, gets 0x3000, and jump to 0x3000, which is where the handler is. It’s extremely confusing at first, but once you get the hang of it you’ll realized this vectorized interrupt system is much more powerful than other modes, you can change the location of ISR on the fly, and it supports a huge number of interrupting devices. It might be just a keyboard for now, but I’m also going to add a WiFi module in the future, which takes 2 ports, a timer wouldn’t hurt either. Anyway, the thing is that FAP is going to have a couple more peripherals later on, and interrupt mode 2 is the best way for it.

Now let’s recap the entire interrupt process to see what my keyboard controller needs to do: When user presses a key, the controller pulls the INT line low, wait until M1′ and IORQ’ both goes low (we’ll call it INTACK’, short for interrupt acknowledge), put the interrupt vector on the data bus, wait until INTACK goes high, then stop driving the data bus. Sounds like a lot of work, but I can just use 2 chips for that. A 74HC32 OR gate to generate INTACK’, which is then tied to OE’ of a 74HC245 buffer, it’s bidirectional but in this case I’ll just set it to a single direction. This way, the interrupt vector on the other side of the buffer is gated to the data bus the moment INTACK’ goes active. The CPU will then go to the ISR after 2 jumps.

Things gets slightly more complicated at the ISR though. The CPU needs to read the keyboard port to see what key user just pressed, that needs some decoding logic, the keyboard controller also needs to respond to CPU’s port read just in time, otherwise the CPU will get some garbage data. To achieve this I’m going to use an additional 2 chips, a 74HC688 8-bit equality detector and a 74HC573 transparent latch. The keyboard controller will latch the keyboard data , and when CPU performs a read, the ‘688 compares the port address and put the latched data onto the data bus.

So to sum it all up again, here is how my keyboard controller will work: The controller receives a byte of the key press, put the data on the latch, enable the latch, pull down INT’, wait for INTACK’, put interrupt vector on the data bus, and pull up INT’ and turn off latch when INTACK’ goes away. The cpu will then try to read from port 0, an IORD’ signal is generated when both IORQ’ and RD’ are low, IORD’ is tied to the OE’ of the latch, which enables it when it’s active, releasing the keyboard data onto the data bus for CPU to read.

For reading PS/2 signal itself I used an Arduino Pro Micro that I have laying around, while the AVR chip in Arduino itself might seem slow and out of date compared to the 32-bit microcontroller today, what’s unbeatable is its incredible communities. There is probably already an Arduino library written for every single thing you can think of, and for PS/2 it’s really a doddle. 2 minutes of google search got me the library, and the rest is just hooking up 2 wires. The Arduino reads the keypress and send out a byte of ASCII code to the main controller, which is a STM32F103 dev board, that manages the interrupt activity. Below is the schematics:

As you can see there are already 6 chips right off the bat, and it’s just the start. Looks like the I/O board is going to be the most complicated board in my FAP computer.

Here is the finished board:

I also wrote a short test program for the new I/O board:

Note how my program start from 0x100 now, it’s customary to start Z80 program at that location because there are reset vectors at $0000, $0008, $0010, $0018, $0020, $0028, $0030, $0038, I guess you can put code there if you’re not using RST instruction, but it’s just good practice to leave them alone. My code sets up stack, load interrupt register with 0x0, select interrupt mode 2 and enters an idle loop. When keyboard interrupt comes in the cpu will jump to 0x10, then jump to 0x3000, where the ISR will read a byte from I/O port 0 and print it on screen.

Here is the corresponding STM32 interrupt controller code snippet:

The interrupt controller turns off latch and interrupt on startup, then start listening from the Arduino that decodes PS/2 commands, once it receives a keypress byte it put that byte on lower 8 bits of PORTA, enable the keyboard data latch, then activities Z80’s INT. It then waits for INTACK’ to finish, after which it pulls INT line high again and disables the keyboard data latch.

Does it work? Watch the video:

As it turned out it works surprisingly well, of course that’s not how it looked like the first time round. At first the FAP would print a few characters, and goes back and print from the beginning, it was clock speed depend as well, as it didn’t happen quite as often at slower clock speeds. A bit of debugging later I found that the CPU would randomly jump to 0x66, and since there’s nothing there, it would slide a bunch of NOP’s all the way to 0x100, and start executing the main program from beginning. As it happens 0x66 address is where NMI will jump to if the line is active. It’s the goddamn noises again, and it also looks like the pull up from main stm32 controller is too weak. A few more filtering caps and a dedicated pullup resistor on NMI later, it’s working like a dream.

Next step? WiFi. More specifically, I want to use my FAP as a IRC client to use on Twitch chats, now we have the video card and keyboard, all that’s left is some way to stream IRC data to FAP, which I’ll tackle in the next post. It’s all getting very close now.

It has been a week since my last post, when we left the action last time the FAP’s video card was working with double buffering and color attributes, and a test run resulted in this:

Remember the program was filling up attribute memory with 0x1b, a purple color, and character memory with letter A. However as you can see above, after running the program some character cells did not turn purple, and some others turned purple, but letters did not change to A. This was because the CPU was trying to write to VRAM while VRAM copying was under way, and the operation was ignored by the GPU. To prevent this from happening we need some way to let CPU know that GPU is busy so it can wait a little until the copying is done. I decided to make a few memory mapped virtual register for my GPU, this way the Z80 can ask GPU if it’s busy first, if it is the CPU will wait, otherwise CPU can write to VRAM right away, here is the updated virtual register code:

Basically I just added a single line of code, now when CPU tries to read address 0x92c0 it will get the value of copy_in_progress, which is 1 when VRAM is underway. This way, the CPU first poll this register every time it wants to write to VRAM, and wait if its value is 1, problem solved! Well not quite:

After running the VRAM-filling program again, there are still a bunch of problem character cells, although much fewer than before. Actually it’s not hard to figure out why: Sometimes the CPU will ask right before VRAM copy starts, so the GPU replies that it’s idle, but when CPU tries to write to VRAM a few clock cycles later the copying will already be underway, and the write gets ignored as a result. We need a way to let CPU know a little while before copy actually starts. I did a dirty little hack by creating another signal that goes active a few scanlines before VBLANK actually starts, this way the CPU will see that GPU is busy a little earlier, so if it happens to try to write before the copy it will have time to do finish the operation. After that, the FAP is finally rendering a beautiful screen full of purple A’s. Time to write a proper “hello world” program:

I created a ‘check’ subroutine which gets called every time before CPU tries to write to VRAM. The program first fill the attribute with yellow, then clear the screen, then print “hello world” at the first row on the screen. Here is how it look like:

Nice isn’t it, it’s the moments like this that keeps me going. A few weeks ago I have no idea how to build a computer, and FAP was just a bunch of 20 year old chips, and now it’s running my program and saying hello to the world like any other proper computers!

Celebration aside though, notice how this hello world program is remarkably slow. I ran it at a slower clock to see the progress, but even at a full 2MHz it still takes like half a second to complete, which is forever in microcomputers. The reason is that the Z80 has to read the gpu_busy register before every single write, which wastes a huge amount of time. A better solution would be let CPU be able to disable GPU copying all together, do the write, then enable copying again. This way the CPU does not have to wait at all, and only has to write to GPU register twice instead of 4800 times. The new copy_enable register is at 0x92c1. Here is the updated code:

I added another if condition so now CPU can read from VRAM too. However most importantly now when CPU writes 0 to 0x92c1, VRAM copy will be disabled. The CPU then can write to VRAM at full speed without interruption. And when it’s done the CPU can enable VRAM copy again and the result will be displayed on the screen on the next frame. I wrote another program to test it.

I put together a “print” function, you put character you want to print in c, attribute in b, index on screen in de, and call it. I first disable the VRAM copy, then clear the screen with char 0 and write “Hello World!” to it, then enable the VRAM copy again. Here is the result:

As you can see this is much faster than the first attempt. The text appears on screen instantly. Looks like disabling copy during bulk VRAM write is the way to go.

However, if you look at the video carefully the supposedly white text appears bit yellow or red. I spent days trying to fix this issue, tweaking the FPGA code and swapping out VRAM chips. In the end I don’t think it’s the VRAM or the copy routine, since the character and attribute are copied together, and all the texts seems fine. I think it’s the noise again, with more than 100 spaghetti wires running around. It might be better than breadboard, but apparently still not good enough. I’ll probably have to design a proper PCB for the video card to see if it gets better. I think I’ll move on for now, I’m just kind of tired with working on the video card right now.

Now the video card is working, next step is setting up the keyboard input for FAP. Find out what happens in my next post!

In my last post I designed and built an early stage of FAP’s video card on breadboard, I decided to use VGA for video output, running 80×30 text mode at 640×480. The VRAM was working, so was the character ROM, and now the board is rendering garbage data in VRAM on start up as characters on screen, all is good.

garbage data never looked so happy

However, there was still some glaring issues left untouched in the last post, for example the lack of colors. That would be the job of the attribute byte. In my design(and a lot of other early PCs) each text mode character is accompanied by an attribute, which describes what color is the said character, should it be blinking, etc. I ignored it last time to get the character generator working faster, now since it does, it’s time to go back to that.

First a little bit about the VRAM layout, the 80×30 text mode needs 2400 bytes for character, and another 2400 for attribute. Those 4800 bytes will be mapped to between 0x8000 and 0x92c0 in Z80’s memory, first 2400 for char and second half for attribute.

Here is updated verilog file with attribute fetching:

This block is now being clocked by the 50MHz internal clock, and the fetch_attribute alternates between 0 and 1 at each clock cycle. At the clock cycle where fetch_attribute is 0 the FPGA gives the character address to the VRAM, and when fetch_attribute is 1 the attribute address. Then the character is looked up in character ROM, while lower 6 bits of attribute is sent to DAC directly, giving colors. Here is what it looks like:

However, the fuzziness is back again. I can think of several reasons: the VRAM is now being accessed twice for every pixel, once for character once for attribute. This could mean it might be too slow again. The pixel clock is 25MHz, which means each clock cycle is 40ns, two memory access during this period means each access only has 20ns, and my VRAM is rated 15ns, so it’s very close. The faster speed also generates more noise, especially on the breadboard. Lastly, the monitor’s auto optimization seems a bit off on the signal of my video card. I bought some 12ns IS61C256 chips, it’s only 3 ns faster, but in FAP, every nanosecond counts.

Every nanosecond counts

I put it on, and it looks a little bit better, but not by much, since there is still the noisy breadboard. I’ll have to wait until I build the thing on the stripboard.

Now that the video card is mostly working by itself, we come to the issue of how to interface it with the CPU. As you know the VRAM holds the characters and attributes that will be rendered on screen, however that requires CPU writing something into it in the first place, but the VRAM is being read by the video card most of the time so the only time that CPU have the chance of writing into it is one of the blanking periods, namely Horizontal Blanking Interval and Vertical Blanking Interval. HBLANK happens every scanline but its duration is extremely short, only around 6us if I remember correctly, our 1970s Z80 won’t have enough time to push a lot of useful things into VRAM during that kind of time, and even if it does it will probably cause screen tearing or visual artifacts, since it change the content of a single pixel line instead of an entire frame. So that leaves us with VBLANK, that lasts around 1.6ms every single frame, which is plenty of time, but that means CPU will have to spend 90% of its time doing nothing but waiting for VBLANK, which is extremely wasteful, that is what Quinn chose to do with her Veronica. One way to get around this is to use a Dual Port RAM which allows write and read at the same time, but that stuff is pretty hard to find. I decided to use my existing parts and implement the tried-and-true method of freeing CPU from waiting for the beam: Double Buffering.

The principle of double buffering is actually pretty simple: two VRAMs, called back and front VRAM, are used. The CPU writes to back VRAM, and during the VBLANK period the content of back VRAM is copied to the front VRAM and subsequently rendered. This way CPU can write to VRAM at anytime it wants, apart from a miniscule amount of time during copying. Screen tearing and artifacts are also eliminated since the entire frame is being modified. The only possible downsides are probably cost and complexity, but I think it would be worth it in the end, when I won’t need to race the beam while writing my programs.

However, at this stage the noise problem on the breadboard is pretty obvious now, and adding another VRAM and 28 wires will only make it worse, so I decided to just build it on the board.

And to make sure I don’t make wiring mistakes I decided to start again and retest everything from the beginning again, firstly just the FPGA and DAC, displaying color bars.

Color bars on the FPGA board

Good, that works. Next step is hooking up one VRAM and try rendering the garbage data again.

Notice how all the noises are gone, good stuff, a lot of U’s for some reason.

Coming up next is the main course of this post: double buffering.

The module waits for the vblank interval, and when it arrives it enables the read of back VRAM and write of the front VRAM, then starts a counter that goes to 4800, which doubles as the address for both front and back VRAM. This way the content of back VRAM is copied to the front as counter increments. The copy_in_progress signal is made available so that other modules as well as the CPU can know if a buffer copy is under way. The total copy time is around 192us, this means instead of having to wait 90% of the time to write to the VRAM, the CPU now only have to wait 1.1% of the time instead. A pretty big improvement.

I added another slot on FAP’s backplane and plugged the video card in, it’s practically indistinguishable from a GTX980 to be honest.

Until now the video card has been working by itself, copying buffers and rendering characters. However sooner or later CPU will have to be able to control it. There are some ways that CPU can talk to the GPU, certain computers map their VRAM directly to the addressable memory (NES and Gameboy comes to mind), while others chose to not expose the VRAM and instead have memory mapped GPU registers, MOS Technology 8568 and VIC-II belong to this kind. For FAP, I decided to use both. VRAM will be mapped to between 0x8000 and 0x92c0, while virtual registers somewhere after that. This way I can easily test something by writing directly into VRAM, while I can also ask GPU to do heavylifting operations like text scrolling, it’s the best of both words.

For now though, it’s just the write-only VRAM mapping, here is the code:

This simple module checks if the MREQ and WR is low and address is in the correct range, and connects the CPU bus to back VRAM if it does. I’ll implement VRAM read and virtual registers later, I can already print some pretty text with just write for now.

Time to write a Z80 program to try it out, if I start to write a value from the back of the VRAM to front, the screen should first change color, since addribute bytes are at the back of the VRAM, once attribute bytes have been filled, text will start appearing as the value fills up the character portion of the VRAM. Here is the test program:

The program first fills the bottom half of the VRAM with 0x1b, binary 011011, the first 01 goes to red DAC, middle 10 goes to green DAC, and last two bit 11 goes to blue DAC, the result should be a violet color. After filling up attribute the program then fills the first 2400 bytes of VRAM with 65, ASCII of letter A. So we should expect the garbage text on screen first turn violet, then fill up with letter A. Here is what happens:

Well it’s mostly what should be happening, however, some characters didn’t turn violet, and there are few missed A’s here and there. This was because CPU was trying to write during the VRAM copy operation and was ignored. My next step would be implementing a few virtual registers so CPU can check if GPU is busy before trying to write to VRAM. But anyhow, progress is progress.

There was actually a extremely frustrating backstory that I didn’t mention, after trying to get the above code work at around 500Hz, I tried to bump up the processor speed to around 500KHz and it simply would not work. Sometimes it enters the end loop early, sometimes it ignores the loop entirely and continues past the end of the program. I thought there was a loose connection or weak short somewhere along the bus but after inspecting and cleaning the board the problem still happens from time to time. I eventually let my STM32 controller print out the address and data content after each clock cycle, here is one of the executing traces:

As you can see address bus is missing bits, 0x8960 turned into 0x8160, and jumping to 0xb turned into jumping to 0x3, that’s why my program was going all over the place. The culprit? Noise. After adding a couple of caps bewteen VCC and GND of the Z80 and memory chip, the problem went away. However I did spent a stupidly long amount of time trying to nail it down. It’s one of the things I should have known better, oh well.

Anyway, FAP’s video card is halfway working now. I’ll finish up the GPU virtual register in the next post, and have my FAP finally say “hello world” like all the great computer did when they were born.

As I mentioned in my first post, Steve Ciarcia, in his 1981 book Build Your Own Z80 Computer, called his computer ZAP as it stands for Z80 Application Processor. Since I was planning to use FPGA as a part of my own Z80 computer, it’s only natural to name mine FPGA Assisted Processor, or FAP in short. And now after getting CPU to work, adding memory, and programming the processor, the time has finally come for me to put the F in FAP and starting designing the video interface of my computer.

Looking back at the history of personal computers, it’s not hard to see that the graphical capability is one of the most important aspects of determining the popularity of a said computer, everything was hooked up to a monitor or TV, and no one wanted to have a computer that only communicates through a row of 16 LED lights. And it’s the same case with FAP. Right now it has no means of input/output whatsoever, the only way to see if a program is executing correctly is halting the CPU and examining the RAM content. My plan is to adding keyboard as input, and video out as output. As for the video, there are a number of standards to choose from. There’s RF, and composite video, then the slightly more modern VGA, and after that comes DVI, DisplayPort, HDMI and all the HD standards. I picked VGA because it’s a rather straightforward interface, and it is still being supported by a lot of monitors. I need a horizontal sync pulse, a vertical sync pulse, and 3 analog color signals between 0 and 0.7V. A pixel clock pushes out pixels at each line from left to right, at the end of the line is the HSYNC to tell monitor to move down to the next line, and VSYNC is generated when the last line has reached, signalling monitor to start a new frame. Standard resolution of VGA is 640×480, however, there are additional pixels and lines called front and back porch, those are not rendered on screen and were used to allow time for electron beam to move to the next line/frame. Those adds another 100 or so pixels on each line, and another 40 new lines. The porch also gives time to update video buffer.

When it comes to actually designing the video interface, you have to keep in mind that Z80’s resources is actually rather limited. If we are to drive the VGA at 640×480, we need to put what each pixel is displaying somewhere in a memory, and that is called a Video RAM. For 640×480 we’ll need 307200 bits just to store a 1-bit black and white image, and if we want colors, it will use more than 300K of memory if I use one byte for each pixel, and Z80 can only natively address 64K of memory. What’s more, the processor will have to update more than 300K dots at every single frame, which is way too slow for the 1970s CPU. Since most of what was to be displayed was text anyway, a lot of person computers at the time implemented what called the Text Mode. Instead of individual pixels, the screen is divided in to a number of text cells, each one containing a character, and with character ROM and some special circuits, a text screen can be rendered much faster, and the content of a screen can be manipulated easier too. And most importantly, this method saves a lot of memory, a 80×30 text mode screen in 640×480 only needs 2400 bytes, compared to 300K in bitmap mode. If I want color/underline/blinking/etc I could use an additional byte for those functions, this is often called the attribute. Therefore two bytes are often used in VRAM in text mode, one byte to specify what character it is, and another to specify the color and a number of other attributes of the said character.

I’m using FPGA for the VGA video controller, since it’s much faster than bigbanging VGA with a microcontroller. A class I took a few years ago used a FPGA board for a few projects, but I have forgotten most it by now, that class was also in VHDL, while a lot of the resources online are in Verilog, so this is basically starting anew for me. I used Mojo V3 with Spartan 6 chip. I like it because it’s open source, has a lot of pins, cheap, and it uses a ATmega32u4 microcontroller to configure the FPGA so I don’t need to spend more to buy a programmer, most importantly though it doesn’t have random peripherals that I don’t need taking up pins, it’s just a simple, clean, minimalist board with all the pins broken out and nothing else, just how I like it.

I wanted to start with something simple like displaying some patterns, and then go from there. Fortunately for me VGA signal generation is one of the most common tasks for FPGA, and there are tons of resources online. I used some code of this rather excellent example, it simply displays some color bars on screen. I tweaked the code so it uses 2-bit color instead of 3, giving the video card 4*4*4 = 64 colors. The color outputs are digital, with 6 bits in total, which means I need to build a simple DAC to convert it to analog signal bewteen 0 and 0.7V, a R-2R DAC is enough, I used 1K-2K for the job.

R-2R DAC for color output

With everything hooked up, I uploaded the code and hooked it up to an old LCD monitor that I picked up for $7 just for this project.

Voilà, Color bars! However, the colors in question looks extremely dark, despite the monitor on full brightness. I tried another monitor and it was still the same, after measuring the output of DAC it turns out it’s only putting out 0.09V instead of 0.7V at full intensity, it looks like 1K-2K DAC can’t provide enough current to drive VGA inputs, which I should have known. I tried to use an op-amp buffer, but it was too slow for the 25MHz pixel clock. In the end I just used smaller value resistors for the DAC, I used 150 and 330 ohm, it’s not exactly double but it’ll have to do, and the peak output with VGA connected is 0.68V, pretty close to 0.7V in the specification. With the new DAC, the image is back to full brightness.

Proper brightness with the new DAC

Eagle-eyed viewers might have spotted the color bars have changed colors, I did it to see the DAC performance, in 00, 01, 10, 11 increments, first only red, then all channels. Well it looks exactly like how it should be like, reds are red, and greys are grey without color casts.

Then it’s time to tackle the details of the text rendering. First I need two counters to know which pixel I’m at, one for each axis, hpos[9:0] is the horizontal counter that goes from 0 to 639, and vpos[9:0] is the vertical counter that goes from 0 to 479. As the controller draws each pixel I also need to know which pixel in the 8*16 font I am at, this is simply the lower 3/4 bit of the hpos/vpos counter. While the upper 7/6 bits of hpos/vpos is the index of the current character being rendered in the 80×30 grid. The controller will fetch a byte from character RAM, which contains the character to render, as well as an attribute byte to see what color that character is. For start I’ll ignore attribute for now, and the memory location of each character would be 80 * vchar + hchar, where vchar and hchar is the coordinate of the letter. And with the memory address, the video controller can give it to character RAM and get a byte of data back to render. I changed the character rendering code to see if it works.

As you can see it calculates a memory address vram_addr based on the pixel counter, and feeds the lower 6 bits of data to the VGA output, I hooked up a leftover SRAM, and here’s what greeted me when I ran the code:

Beautiful isn’t it? You can see each character cell, where letters will go in, only right now it’s just random colors from junk data inside the SRAM. Say I want to display “A” in a character cell, the content of memory of that cell will be 65, the ASCII code of A. Right now the VGA controller gets the data of 65 and just put its lower 6 bit out as color, next step is making a character ROM, so the VGA controller will look up 65 in it, and see which pixel it should render to put the letter A on screen instead of just a block of color.

And what do you know? Open source community to the rescue again! I found the exactly thing I need on github, a 8*16 code page 437 font. After dropping it in and giving it column and row data, I got this:

We actually got text! Instead of color blobs the VGA controller is rendering characters from garbage data in the SRAM. The characters do look somewhat fuzzy though, and some characters are not rendered properly. To compare with a known example, I filled up a EEPROM with 0 to 255 and dropped it in, it should look like this:

image source wikipedia

And here is what I got:

It looks even worse than before! If you look really closely you can see it follows the pattern of the code page 437, but every single character is mangled, sometimes beyond recognition. I first thought it was a breadboard noise problem, as I’m running 25MHz on it, and Quinn had this issue while designing hers. However looking closer I can tell each character are mangled in the exactly the same way, and they don’t change when I wiggle some wires, noise tends to be random, so it is not the case. I then thought something might be wrong with the character ROM code, but that seemed unlikely. I then thought it was my crappy monitor, but it looks the same on a better one I tried. Drunk and out of ideas, I though I hooked my wires wrong and basically started messing around with the SRAM data lines, while doing so I found something interesting, this is what the screen looks like while hooking up the 2nd data line directly to the address line:

Happy faces! And most importantly it looks perfect! So it’s not noise nor character ROM problem. I then hooked up the same wire to the data output of the EEPROM:

It looks like the faces are shifted to the left, looks like a timing problem to me. Time to break out the logic analyzer.

Channel 4 is pixel clock, channel 0 is address, channel 2 is data, here you can see data lags behind address for a whopping 140ns, that is more than 4 pixel clocks, and that’s why those faces looks shifted 4 pixels, it turned out my SRAM and EEPROM is too slow, who would have thought that? The AT28C256-15PU I have is rated 150ns, so it’s pretty close, the SRAM I have is rated 50ns, which is still not fast enough, that’s why the image with SRAM is still fuzzy, but not as bad as the one with EEPROM.

Time to buy some new parts, IS61c256 looks promising, it’s the same 32K SRAM but with only 10 – 15ns of delay, which is 14 times faster than the one I’m using. It’s in SOJ package though, so I ought a fee breakout board too, I also picked up a newer Z80 CPU to replace the current one I’m using, this one is CMOS so it uses less power and can run up to 8MHz.

New parts

Soldered the new SRAM, hope I didn’t burn it out.

And after plugging it in, here’s how it looks like:

The speed does make a difference! All the character are sharply rendered now, it’s still displaying garbage because that’s what in the SRAM and the time, but the system is working.

Now that the text rendering is working, next part is the interface between the processor and the VGA controller. I’m so drunk and tired now so I’m just going to end it here, stay tuned for the next post!