The DMA problem on Prometheus

The Prometheus PCI bus board can operate with up to four PCI cards. In normal usage as
a VGA card, and also with 10MBit ethernet cards everything works just fine.
Problems start if PCI cards are in the system which need DMA (Direct Memory Access) to
work correctly. Almost all modern PCI cards use this feature (USB cards, 100MBit ethernet,
sound cards, ...).
The Amiga system freezes shortly after the software using the DMA card has been started;
in other cases the VGA display is getting distorted or data errors occur.

How DMA works on Prometheus

You may regard the PCI part of Prometheus as a completely isolated computer system.
PCI cards located there can operate on their own without even knowing what is happening
on the Amiga side.
DMA is a crucial feature of PCI bus operation; it means that a single card is taking the
role of a bus master and is doing data transfer just like a CPU would.

In every DMA transfer you have a master (initiator) and a slave (target). The master is
requesting transfer time on the PCI bus (from the arbiter, more on this later) and starts
its transfer as soon as he is getting the bus granted.
DMA on Prometheus will always be between two PCI cards, and never between
a PCI card and the Zorro bus.

In general, the DMA card will transfer its data to or from the PCI graphics card using a
part of the display memory as data buffer. Data from the DMA card will be fetched after DMA
from the Amiga CPU by reading; the other way round the Amiga CPU will deposite the data
aimed for the DMA card inside the memory buffer and instruct the DMA card to fetch it there.

Signal description

To understand the problem and the timings from the logic analyzer some explanation is needed.
Please keep in mind that a "#" following a signal name indicates that this signal is
active low.
The signals FRAME#, IRDY#, TRDY# and DEVSEL# are used as
basic handshaking signals for PCI data transfer.

FRAME# is set by the master to indicate start of a transfer.

IRDY# is set by the master to tell the target that it can provide or
accept data.

TRDY# is set by the slave to tell the master that data was accepted (write)
or valid (read).

DELSEL# is set by the slave to indicate that it feels responsible for this
transaction.

CBE[3:0]# signals are used for setting the transfer type of the access; the also are
used as data strobes in the data phases of the access.

AD[17:16] are two of the 32 multiplexed address / data lines.

REQ0# is the bus request line from PCI slot 0 (here: USB card) to the arbiter;
REQ1# is the same for the Voodoo3 card in PCI slot 1. The arbiter is granting the bus
to a requesting master by asserting its grant line: GNT0# for the USB card, GNT1#
for the Voodoo3.

One Zorro III signal is of concern here: /SLAVE is an active low signal indicating the
the Amiga CPU us accessing the Prometheus card (and in this case, the PCI bus).

Big Brother: the arbiter

There is one instance in the Prometheus system which is responsible for doing the bus arbitration.
This means that different masters which want to access the PCI bus need to tell the arbiter that
they want access; the arbiter will then distribute access slots to each of them. A master requesting
bus access but not taking its chance will timeout and the next requesting bus master will be served.

In the Prometheus system we have five different potential masters: the four PCI slots (more specific,
if DMA is used one slot is the graphics card, so only three real masters will be on PCI cards as long
as the graphics card does not do DMA on her own) and the Amiga CPU on the Zorro III bus.

While arbitring between PCI cards is fairly easy, getting the Amiga CPU in the game is rather tricky.
The CPU does not known anything about the state of the PCI bus, and even asking for the current status
wouldn't help, as a single Zorro III cycle takes about 175ns minimum (Prometheus: about 300ns), whereas
one PCI cycle can be done in about 100ns - the CPU will never get a current status.

The main problem is that Zorro III cycles cannot be extended in time beyond a limit of 1us; if the
bus cycle is longer it will be broken by the Buster chip which will give a GURU meditation. Therefore
accesses from Zorro III to PCI must always be treated highest priority.
If the Amiga CPU is starting an access, the arbiter will take over the PCI bus (if granted to a PCI master)
and grant it the Amiga CPU. This means that a running PCI DMA transfer will have to be finished fast
(this shouldn't be a problem, as PCI cards know about this rule).

The smoking gun

So let's take a closer look on the problem. I could reproduce the error condition by simply using a
NEC USB2.0 card as PCI master together with a Voodoo3 as DMA buffer. The USB card is in PCI slot 0,
the Voodoo3 in PCI slot 1.

Snapshot of the broken PCI access

What is happening:

the USB card requests the PCI bus (REQ0# is low)

the Amiga CPU is accessing the Prometheus board (/SLAVE is asserted low)

due to synchronization processes the arbiter didn't yet get the request from the CPU and
is granting the bus to the USB card (GNT0# is asserted low)

the USB card starts its transfer by asserting FRAME# low, followed by IRDY# low.TAKE CARE: this is legal, even as the arbiter is withdrawing its grant soon by deasserting
GNT0#. The started PCI access from the USB card is valid and must be finished before the
Amiga CPU can go on the bus.

now the drama starts: the Amiga CPU has a grant, so it starts simply driving the bus which is -
at this moment - owned already by the USB card. It sets FRAME# high, asserts it low and
drives IRDY# low now.

the Voodoo3 card, which had been addressed by the USB card, doesn't recognize this hostile take over
and finishes the access in a normal way by asserting its TRDY# line.

By this bug (which will happen to each Prometheus out there, once in a while, at a statistical point) both
the DMA transfer between the USB card and Voodoo3 memory, as well as the data transfer between Amiga CPU
and Voodoo3 card have produced scrambled data.

Bug fix

I included a small hack in the Prometheus arbiter. It simply delays the internal grant signal for the Amiga
CPU after deasserting the PCI GNTx# (i.e. getting the current PCI master off the bus).
Now the Amiga CPU access is delayed until the current master has finished its DMA transfer.

Snapshot of the workaround

What is happening now: in principle the same as explained above. The main difference is now that the Amiga
CPU access (visible by /SLAVE asserted low) which occurs in parallel to a running PCI DMA is not
disturbing the DMA transfer. The arbiter is getting the PCI master off the bus (GNT0# deasserted),
waits for the end of the PCI cycle (TRDY# asserted by the PCI slave) and then starts its own PCI
access (FRAME# asserted low).
The USB card is not happy about being taken off the bus, so it requests the bus again immediately by asserting
its REQ0# line; with the end of the Amiga CPU access the arbiter is granting the bus back to the
USB card (GNT0# asserted again). The USB card is the doing one more DMA transfer.

New arbiter under development

To get rid of the know limitations of the current arbiter (which has serious problems with fairness arbitration)
a new arbiter concept is under development. The PCI bus arbitration is based on several simple rules, which are easy
to understand but hard to implement.

Rules of engagement:

Zorro accesses to PCI bus have always highest priority.

Zorro mastership request will kick off the current PCI master ASAP.

Each PCI master will be granted a certain number of clock cycles on the bus.

Only PCI masters asking for bus access will be considered in arbitration.

PCI masters being kicked off by Zorro will get the bus back after Zorro access.

Fairness in arbitration will be granted - each PCI master will get its time slot.

A first version of this arbiter is under test now: in slot 0 there is a NEC USB card, in slot 1 an
ethernet 8139 card. Slot 3 carries the Voodoo3 card being used as DMA buffer.

Snapshot of first tests with a new arbiter design

You can see that the Zorro master is accessing the bus frequently (/SLAVE and /DTACK signals
mark each Zorro access). At the same time the USB controller is asking for bus access, and also the NIC wants to
transfer data.
On the left side, the bus is granted to the USB controller (three small time slots on GNT0#, between high
priority Zorro accesses); the waiting NIC is granted the bus after the USB controller has used up his maximum clock
cycles (four small time slots on GNT1#). Please note that the USB controller again wants the bus
(REQ0# is asserted), but the arbiter is regranting it to the NIC, as his time slot is still not used up.
After this, the PCI bus is given to the USB controller (again three time slots on GNT0#), then the NIC
is finishing his transfer (which didn't fit into the first granting period) by one small GNT1#; the rest of
this time slot is skipped as the NIC doesn't want the bus anymore, so the pending request of the USB controller
on REQ0# is being served.