Project: DMA Board

This is a project idea we were chatting about on ##Atari today. The basic idea is that we make it possible for PBI devices to take over the system bus to perform DMA transfers at up to 1.7 megabytes per second. This is done by hooking into the way that ANTIC stops the CPU with a small riser board, and connecting /HALT and a new /DMA signal to two currently unused pins on the PBI.

The device protocol would work like this: when a DMA transfer is desired, the device asserts /DMA. Every cycle that starts with /DMA asserted is a DMA cycle for the device *unless* /HALT was asserted before the cycle began, in which case the device has to relinquish to ANTIC. Devices can use the presence of /HALT at a high level during a read of the device's registers to detect whether DMA is present; if not, devices can fall back to programmed I/O.

I whipped up a quick board layout for this; the board is also designed to generate a "clean" phi2 output as well as allowing the use of a W65C02 or W65C816 instead of 6502C. It does not support other 6502s or 65C02s which do not have the /BE input.

The 130XE and it's European derivates already have the /Halt signal on the PBI (read: ECI). The XL has EXTSEL and EXTEN (the XE only has EXTSEL). One of those is in input signal. What would be the difference between that one and your /DMA?

The CPU's /HALT input would be the logical-and of ANTIC's /HALT and the /DMA input. In other words EXTSEL and EXTEN are useful for remapping your device into RAM but they are not useful for gaining bus mastery. The /DMA signal lets devices control the address and data bus directly, either for transferring data into and out of RAM or a device like POKEY or GTIA, or even to support an external CPU.

The 130XE and it's European derivates already have the /Halt signal on the PBI (read: ECI). The XL has EXTSEL and EXTEN (the XE only has EXTSEL). One of those is in input signal. What would be the difference between that one and your /DMA?

Mathy

If your asserting /HALT, then how would you know Antic is asserting it, and needs to do video DMA?

Because /HALT from ANTIC goes out to the PBI and into the GAL, where it is combined with the /DMA input from the PBI to feed the /HALT input of the 6502C and the flip-flop circuit which locks phi2 and /BE for the other two CPUs. The GAL also has to set R/W to true when ANTIC halts and tri-state it when the PBI device halts.

there is another way of doing dma without stopping cpufirst presented on apple computers - it will require a bit more than this,, but it will be rewarding (hint - use sync line)as for antic halt - there is no easy way out to know when it happendshalt is asserted one cycle ealier than antic taking over the busyou could assume, there is no two halt cycles one after another without any cpu cycle in between, but you can't be sure of anything

I have thought about DMA, but what use would it be? Video? With PBI devices running > 50KBS, it seems like a lot of work for little gain.

Notwithstanding, it would be pretty easy. HALT is asserted right around the time 02 rises and is latched at 02 falling. It doesn't matter who is gating HALT since the CPU is dropped off the bus by BE in either case. Where HALT originates is only of concern to who will have control of the bus.

The main design issue is: what will be driving the DMA cycles? You need some kind of intelligent controller that can deliver data and addresses at a high rate. Can't be a 65816 at 1.79MHZ... you might as well use the system CPU. So, you're looking at a CPLD with some regs, clocks and logic in it. And, a 24-bit (or more) bus that can be tri-stated.

Now, if you had the system 65816 running at 14MHZ, you wouldn't need any of that stuff. You could clock at 1.79MHZ when MPD was inactive and 14.32MHZ when it was asserted, if you wanted. Move a lot of data, really fast.

For memory <-> I/O DMA, the controller device needs only to do two things: (1) generate the RAM address to read/write from, and (2) select the I/O device to write/read from (with the appropriate device register address selected).

There are some devices which "know" about DMA already for which you don't need (2), but I think for the most part it's not avoidable. So, what this means is, in order to do DMA to an external I/O device, you need to be able to generate two addresses, one for the system RAM and one for the external device. The system RAM address has to be a counter clocked on phi2's falling edge in order to do full-speed DMA.

To do DMA to an internal I/O device, you have to keep the device address on the bus for each byte transferred (in many cases you'd just do one at a time), but this means the data has to be transferred to external memory which doesn't use the system address bus for the transfer. One application I've thought of is using FIFO RAM, clocking it at some fixed rate (22KHz or 15KHz whatever), and every clock doing 4 (or transfers to load bytes from the FIFO to each POKEY volume register. The CPU just loads up the FIFO every vblank period or so. In this way you have basically a hardware mod player. Likewise you could even have simple hardware waveform generators to do this instead of pulling the data from RAM, and thus be able to generate nice instrument sounds. Since it's /HALT driven, if you sync to the hsync rate it might skew a few clocks at first but eventually it would "fit in" to the spare non-ANTIC cycles on any mode line. You could have a POKEY tone generator which also allows concurrent usage of all POKEY timers... even SIO, probably.

The application I'm really hoping to utilize this with though is networking i.e. high-speed downloads via a DMA-enabled PBI ethernet device. Anyway it seems like there are lots of interesting things you could do with this type of circuit.

Here's my second revision at board layout. Metalguy66 pointed out that I'd be bumping the keyboard on my 600XL since that front row is so close. So I assembled some spare perfboard and header pins and a socketed 6502C and did some testing of how it would all fit, and this is what I came up with:

I have been thinking about something similar, but a little different. I thought about replacing the RAM with high-speed SRAM and use the first half of the cycle to do the DMA. That way you won't have to halt the CPU. If you buffer the ANTIC bus, you could even get your refresh cycles back by disconnecting the HALT line and bus in case of refresh cycles. It's all just thoughts though, nothing concrete, but it might give you/others some inspiration