socz80: A Z80 retro microcomputer for the Papilio Pro FPGA board

Overview

I built a small FPGA
microcomputer for the Papilio
Pro board. I've ported a few operating systems to run on it.
These 8-bit machines have very minimal features but (somewhat
unexpectedly) I found they can run a multi-user, multi-tasking UNIX
operating system.

Introduction

My first computer, in 1989, was a PC. It was a 16-bit 80286 machine.
I missed out on the whole 8-bit generation, but I've always been
interested in those machines, from the somewhat mythical golden age
when people hacked up their own computer design on their kitchen table,
before every manufacturer just cloned the IBM PC architecture ad
infinitum. So when my wife bought me an FPGA for my birthday I decided
to build my own 8-bit machine for it, in order to learn about them and
the software they ran.

A lot of people say to me, "What is an FPGA? And why am I asking
this question?"

An FPGA is basically a
computer chip which can be reprogrammed with a different circuit so
that it behaves differently. You can make them into all sorts of
different chips, simple or complex. Over time their cost has fallen
and you can now build formidable circuits inside these devices. Just
$10 will buy an FPGA big enough for a computer.

This was only my second FPGA project and was also my first attempt
at writing code for a Z80, so the quality of my code is probably not
brilliant! The machine works well though and I've had a great deal of
fun with it.

The Papilio Pro is a great board and I thoroughly recommend it. It
has a Xilinx Spartan 6 LX9 FPGA, 8MB of SDRAM, 8MB of SPI flash memory,
and an FTDI USB interface that is used to connect JTAG and UART to a
host PC. Everything works great under Linux. My main criticism of the
board is that the serial link via the UART has no flow control lines
hooked up to the FPGA -- the FTDI has a deep FIFO (kilobytes) and you
can build a receive FIFO inside the FPGA, but at high data rates you
will inevitably overflow these eventually.

I also have a Pipistrello FPGA board which is based on the same
Papilio form factor. It has the UART flow-control hooked up, has a
larger and faster DDR SDRAM chip as well as a much larger LX45 FPGA.
You can use the Xilinx on-chip memory controller block to drive the DDR
SDRAM chip. It has HDMI (in or out). The power supply on the Papilio
Pro is more efficient but otherwise the Pipistrello is better albeit at
a higher cost. I've not had time to get it working yet.

The Papilio form factor is very hardware-hacker friendly; all the IO
pins are broken out on 0.1" headers so you can easily pop a bit of
veroboard on top and solder up a MAX3232 or SD card or LEDs or
whatever.

About this time in the conversation those same people say to me, "Are you detaining me? Am I free to go? Please?"

Hardware

I started my Z80 system with the open-source T80 CPU core,
a UART that I'd written for an earlier project, and some of the on-chip
block SRAM for memory. I then tried to wrote a simple monitor program
for it, my first Z80 program after "Hello world".

Xilinx have a "data2mem" tool that you can use to quickly
replace the data loaded into a block RAM without resynthesising the
FPGA design (a tedious process), so you can assemble your monitor
program, use data2mem to have the code loaded into block RAM, then
reprogram the FPGA which will run the code when it comes out of reset.
This affords a very quick edit/compile/test cycle, about three seconds from
hitting enter to running code.

Once I had a monitor program running I imported Mike Field's
brilliant
Simple SDRAM Controller to drive the 8MB SDRAM chip on the
board. Having the monitor in reliable SRAM made it easy to test the
SDRAM and work out the bugs just by using deposit and examine memory
commands.

The SDRAM gave me access to 128 times more memory than the Z80 could
address, so I added a 4K paged MMU to translate the 16-bit (64K)
logical address space into a 26-bit (64MB) physical address space. Each
4KB logical page can be mapped independently to any 4KB physical page.

The SDRAM takes on the order of 10 cycles to supply data after a
read request so I implemented a 16KB direct mapped cache using the FPGA
block RAM in order to conceal this latency. This works very well. The
FPGA block RAM is 36-bits wide which allows for a 4-byte wide cache
line plus 4 bits to indicate the validity of each byte.

Debugging the cache was a pain. I ended up writing several programs
to exercise and test the memory in various ways; when I found a fault
it often took some head-scratching to determine if it was a bug in the
hardware or the software! This is doubly hard when the software is
itself executing from unreliable memory, so I added a 4K SRAM using
FPGA block RAM and used the MMU to map that wherever I wanted.

The MMU also has what I call the "17th page" which allows you to
access any physical memory address without mapping it into the CPU
virtual address space -- it has a 26-bit pointer in the MMU and an I/O
port that translates I/O cycles into memory cycles, automatically
incrementing the pointer after each cycle so you can use the
INIR instruction with it to do block copies of unmapped
physical memory to/from mapped memory.

The Xilinx synthesis tools tell me my design is good for about
70MHz. I've always run it at 128MHz without problems. The Z80 is
rather fast at 128MHz and even the simple cache is surprisingly
effective at keeping it fed with data.

Operating Systems

Once I had the hardware working I had a lot of fun writing software
for it, extending the hardware capabilities as the software grew more
sophisticated. I ported three operating systems to the platform, in
each case porting them before I had ever used them!

I wrote a CP/M-2.2
BIOS. This wasn't too hard, the original
documentation is very good and having access to a modern
computer certaintly makes it much less arduous than it would have been
in the 1970's.

There's so much RAM in the system that I just used the top 6MB as
three 2MB RAM disks, which hugely simplified writing storage drivers.
For persistent storage I decided to copy the RAM disk to and from the
unused space in the flash ROM on the Papilio Pro board. I wrote SPI
master hardware and some routines in the monitor ROM for the copying.

Once I had CP/M working I found out about its multi-tasking
multi-user big brother, MP/M. Again the
original Digital Research documentation was invaluable when writing an
MP/M-II XIOS and getting MP/M-II running. I added interrupt driven
consoles, a second UART so a second user can use the machine concurrently, and a
simple interval timer for pre-emptive multi-tasking. I was really very
impressed with MP/M-II, I had not realised that these Z80 systems could
multitask and support multiple concurrent users (and all before I was
even born!)

Once I
saw that multi-tasking was feasible on this hardware I got a little bit
ambitious and decided to port UZI, Doug Braun's
8-bit UNIX like operating system. UZI runs multiple processes with
pre-emptive multi-tasking and supports multiple consoles like MP/M. It
presents the standard UNIX system calls to processes, which (in my
implementation) each have 62KB memory available to them. You can
dynamically mount filesystems. UZI has its own filesystem format. UZI
is free of AT&T code but offers features similar to the 7th edition
Unix kernel.

There's little or no documentation so
this was harder than writing the BIOS/XIOS where there is a clear
specification of what you need to do. I started with the P112
UZI-180 port which uses the Hi-Tech C/PM C compiler.

I ported the kernel to ANSI C and made it build with the modern SDCC compiler, added
drivers for my MMU, UART, RAM disk, an SD card interface, and removed
the Z180 instructions. I modified the context switching mechanism to
make it much more efficient by eliminating all the memory copying. I
also increased the amount of memory available to processes -- a native
UZI process can use up to 0xF900 (62.25KB) and a CP/M process running
under emulation has a 60KB TPA (larger than under real CP/M!)

The UZI kernel now works well on this hardware but I do not yet have
a good way of building userspace applications for it. Suggestions
warmly welcomed! At the moment I am using the P112 project's UZI-180
distribution root filesystem with relatively few changes.

Download

I've not really worked on this project for the last four months.
I've decided to give it away in its current state rather than wait
until I have both the time and motivation to make it perfect (which may
never happen).

Project Ideas

This project is fun to use but it's much more fun to build.

Change the CPU for a different 8-bitter, like the 6502 or 6809. Open
source VHDL cores are available for both.

This Z80 is fast. But it could go faster! A really simple trick
would be to modify the T80 core to remove the memory refresh cyles.
There's no DRAM connected directly to the Z80 bus and none of my
software uses the R register, so these could be eliminated
without any ill effect.

The Z80 core uses quite a few cycles for each instruction compared
to a modern processor. You could try building a "Z80+" that executes the
common instructions in a single cycle and/or employs a pipeline to
execute each instruction in multiple steps but multiple instructions
concurrently.

You can buy inexpensive ENC28J60 boards on eBay. These talks SPI on
one side and ethernet on the other. I've already written an SPI master,
so it would be very quick and easy to add ethernet hardware to this
machine with this. The software side might be a satisfying challenge.

I've heard about an operating system called TurboDOS but never
seen it run. Apparently it was designed to be used in a network, with
co-operating TurboDOS machines sharing resources. You can download it
from The TurboDOS
Museum. This might be fun to port and run, perhaps in
combination with the ENC28J60 and multiple FPGAs, or multiple systems
on one FPGA. You could write a Linux server process that speaks the
TurboDOS protocol to provide resources to the Z80 boxes.

I saved as much block RAM on the FPGA as I could for future
peripheral hardware. An obvious extension would be a video display.
Combined with a keyboard interface these could replace the UART as the
system console.

I did some work on a dual-processor version of this system. My
design basically just dual-ported the SRAM and cache memories so both
CPUs could use them concurrently, added an arbiter for the CPU's to
share the IO bus, and gave each CPU a separate MMU. When a CPU used the
MMU IO registers it talked to its local MMU only; all other IO
registers were shared. One IO register was special in that it read as
0 from CPU0 and 1 from CPU1, this was so the monitor could do the right
thing with each CPU on boot, eventually I expect the interrupt handler
would require it also. I didn't get as far as doing the interrupt
routing hardware. Realistically the only use I could come up with for
the second CPU was to run firmware for a terminal on future
keyboard/video hardware, or under UZI for SMP operation.

Build a more modern machine using a 32 or 64 bit processor. There
are several CPU cores released under free and open licenses -- MIPS,
OpenRISC, ARM, ZPU, x86, several of each architecture. These would be
an easier target for porting modern software, and faster too.

Feedback

Further Reading

While I was working on this project I rather enjoyed reading The
Computer Journal which was published from 1983 onwards for
about 15 years. The early issues have a lot of articles on 8-bit
hardware, CP/M and its derivatives.

If you read this whole page, the N8VEM
project would probably be relevant to your interests too.