Amiga Parallel Port: How fast can you go?

By lallafa, on September 8th, 2015

In my plipbox project a fairly fast AVR 8-bit MCU with 16 MHz was connected to the Amiga’s parallel port to transfer incoming and outgoing IP packets from/to the attached Ethernet controller. A protocol on the parallel port was devised to quickly transmit the bytes in both directions. In version 0.6 a data rate of up to 240 KB/s was achieved… The question now arises if this is the top speed we can get or is the parallel port capable of more?

This blog post shows the results of my experiments I performed with the parallel port on my Amiga. It tries to show different classes of transfers possible on this port and gives the achievable maximum speed of each class.

Since the available documents and data sheets are all lacking the exact description of the I/O part on the peripheral side of the device, this blog post is also an effort to try to document this undocumented side of the parallel port (or: “What you always wanted to know about your CIA 8520 and never dared to ask”)

1. Introduction

1.1 The CIA 8520 and the Parallel Port

The Amiga has two custom chips called the CIAs 8520 (Complex Interface Adapter) that are called CIA A and CIA B. A CIA chip has two I/O ports (Port A, Port B) with 8 bits each that can be individually configured for peripheral input or output.

The parallel ports pins consists of three kinds of pins:

8 Data Pins (In or Out), Pin 2-9

1 Strobe Line (Pin 1), 1 Ack Line (Pin 10): Hardware Handshake

3 Control Lines (BUSY, POUT, SELECT) (Pin 11, 12, 13)

Those pins are connected to the two CIAs as follows:

CIA A, Port B: 8Data Pins

CIA A, PC and F to Strobe and Ack for Hardware Handshake

CIA B, Port A, Bits 0,1,2: BUSY, POUT, SELECT

While CIA A Port B handles the data pins, CIA B Port A handles the 3 control lines. Note that the other bits of this port are connected to serial port lines.

In the Amiga memory map both CIAs are mapped to different memory ranges. Here is an excerpt with the registers useful for parallel port programming. (See the Amiga Hardware Reference Manual, Appendix F for a complete list)

The data direction register (DDR) for both ports set a bit of the port either to input or output. The logic of the data pins is not inverted, i.e. a 1 in a register is a high (5V) value on the line.

The default values indicate the setup after the Amiga has booted and sets all parallel pins to input.

1.2. The CIAs in the Amiga system

The CIA chip is compared to the MC680xx CPU clock of an Amiga a fairly slow device. It can handle a clock rate of up to 1 or 2 MHz while the CPU runs at 7 or more MHz. The MC68000 CPU architecture offers a special mode of device access for these devices that is based on a slower clock called the E clock. It runs at the 1/10th of the CPU clock speed.

Lets see some numbers:

CPU Clock F_CPU = 7.16 MHz (NTSC) 7.09 MHz (PAL)

E Clock F_ECLK = F_CPU / 10 = 716 KHz (NTSC) 709 KHz (PAL)

E cycle length t_ECLK = 1.40 us (NTSC) 1.41 us (PAL)

This means the CPU accesses the CIA with at most the speed of F_ECLK. An access is a read or write to a register. So when we transfer data we either read or write the data register of CIA A Port B. If we only access this register the top speed we can ever achieve on this port is one byte per F_ECLK or 716/709 KB/s max!

If you look in the data sheet of the CIA’s ancestor device called the MOS 6526 you will see that the E Clock interval is divided into two sections: a HI and LOW range of the clock interval. While in the HI range (4/10 of t_ECLK) the CPU accesses the device, in the LO range (6/10 of t_ECLK) the device starts to realize the change set by the CPU, i.e. if a port is on output it will set the pins low or high accordingly. On a read the data has to be stable on the port before the HI phase will access it from the CPU.

1.3 Hardware Handshake with Strobe

The parallel port offers two pins for hardware handshaking called Strobe and Ack. The hardware handshake allows to signal the external peripheral whenever new data has been set (or read!) on the external port of the CIA. After data is valid the strobe line sends a short pulse (low active) on Strobe to signal the receiver. It will then read the data byte and acknowledge the transfer by pulling Ack low. The Amiga detects the Ack pulse either by polling or by interrupt and then transmits the next byte.

While strobing (i.e. generating the Strobe pulse) after a Port B read/write happens automatically, you have to manually trigger Ack to confirm it.

Lets see a time sheet with some E clock cycles (.H, .L being the high and low range of the cycle)

This examples writes a byte with value 42 to Port B. The CIA realizes this value in the next to sub cycles (denoted with !) and beginning with 1.L a stable value of 42 is available on the output of port B. Then strobe goes low for a full E cycle length.

We see that strobe has to be delayed otherwise a peer reading on falling edge of strobe won’t have a stable data signal.

The interesting questions that now arise are:

What is the strobe delay in cycles of the 8520 CIA on the Amiga?

What is the strobe width in cycles?

How fast can we transfer data and still get valid strobes?

The answer to the first one can be found in the Amiga Hardware Reference Manual, Appendix F, Section Handshaking):

PC will go low on the third cycle after a port B access.

But the other ons are unanswered in the docs. So its time for some experiments…

2. My Experiments

My Setup is an Amiga 500 with ACA500 and ACA1230/33 Accelerator attached. A plipbox device was attached with running version 0.6 firmware unless otherwise stated.

2.1 Setup Port

Using ASMone I quickly hacked some code to set the parallel port to data output and all lines to low/zero:

A strobe of length 3 * E! So the first write’s strobe and the second one is somewhat merged now.

The $ff write happens right before the left marker and is established on the port inside the two marker’s range. (LO-HI transition = slow)

The $00 write happens right before the right marker and is established right after the marker. (HI-LO transition = fast)

The $ff value is only valid at the end of the two marker’s interval!

Let’s write 4 bytes in a row:

Interesting Result:

Still a Strobe of 3E cycle length! The strobe width is not enlarged, no matter how many bytes you send. Seems that the strobe logic gets stuck.

Data $ff, $00, and $ff is valid at the end of the E ranges around the falling edge of strobe

Now 4 bytes starting with $00 (port was $ff):

Write $00,$ff,$00,$ff (Port was $00)

Same result here:

A 3E Strobe and nothing more!

Data again valid at the end of the E cycles around falling edge of strobe

To sum up this experiment: While we can write to the CIA from the Amiga with E cycle speed, the resulting strobe signals are not useable anymore! However, all data values appear on the port lines (in fragments of the E cycle).

Let’s call the non-stop writes to the CIA 1E Transfers and let’s experiment now with transfers that take more E clock cycles in the next experiments.

2.4 2E Transfers

Ok, we need to make a pause between the data write from the Amiga. To be precise we want to wait for multiples of the E clock. The best way to perform a “wait” on (or better waste) an E clock cycle is to actually perform a register access to one of the CIAs. Make sure to perform an access with no side effects, so reading a port A (i.e. does not strobe) already does the trick.

A 2E transfer code now does a write (1E cycle) and one pause (second 1E cycle) looks like this:

Strobe is back again at 2E. But only the first one is visible! All others are gone 🙁

Complete range of 1E port data valid (1E range for port setup)

Note: instead of reading a “waste” value in the second E access to the CIA, you can also perform a single control signal write. In the picture above I toggled the SEL signal. This gives you an exact location of the 1.H, 3.H, … locations and can be used on the receiver side as a sync signal! (Very useful since strobe is broken here)

Note2: If you toggle SEL (or POUT, BUSY) you can only write the Port A (but not read it beforehand). Therefore, a signal update of only the parallel line bits won’t work. In fact you have to ignore serial line bits in the same port and write them always to a constant value -> Serial lines don’t work with 2E transfers !! or in other words: There is no system friendly way to implement it…

2.5 3E Transfers

Since 2E transfers still have broken strobe output, lets add another “wasted” cycle and setup a 3E transfer. With two spare E cycle accesses in our transfer loop we can also use the two cycles to perform a read/modify/write operation to a register. E.g. a bclr (bit clear) or bset (bit set) operation can be used to modify a control line of the parallel port and is then used as a “clock” line for our data transfer.

The external device needs to setup data right before the CPU access. While the .L sub cycle before the read might suffice for stable read it is more safe to already setup data in .H before

If you use a parallel port control line to “clock” the data you can set the line before the first CPU read and start reading with the first byte.

If you want to use the strobes to sync your reads then you have a problem: The strobe signal arrives _after_ the read! To get in sync with this signal you must use a trick: first perform a dummy CPU read just to generate a strobe and then use this strobe to sync your device’s writes:

In the above time sheet we dummy read at 0.H

The device already sets up data 0x22

The CPU performs the next read at 3.H and gets 0x22

The device waits for the raising edge of strobe (4.H – 4.L) and sets the next data

Reading in 2E and event 1E gets more difficult as in the worst case no “clock” signal is available and you have to use a sampling pattern with fixed E size to setup the data in time from the device. It is still open if it possible to write a stable 1E transport this way.

In most reader code the interrupts have to be disabled on Amiga side otherwise the clocked setting up of data before a read might arrive too late and thus a CPU read gets wrong.

3. Summary

This (rather long) blog article shows you all the details when transferring data over the parallel port at the maximum possible speed. We discovered some interesting anomalies with strobe generation at these high transfer rates.

I introduced a new speed classification for the parallel transfer types called 1E, 2E, or 3E transfers.

The top speeds achievable with the xE transfers are:

1E: 709..716 KB/s
2E: 355..358 KB/s
3E: 236..239 KB/s

Current plipbox version 0.6 implements a 3E transfer using external control lines for clocking. I am currently experimenting with a 3E transfer using only strobes as signalling (it frees control lines for other functions). Another interesting coding exercise will be a 2E or even a 1E transfer… Now the technical background is available!