My software needs to not only detect the presence of a 65C816 CPU (taken care of), but also compute and report the CPU speed. The way I've gone about this (assuming a 6502 for the moment) is to run a block of code which updates a counter while waiting for a full frame to elapse (with DMA off, of course). We know the cycle count of the code block, so armed with the final value of the counter, we can compute the number of cycles per frame.

The first thing I ran into (Antic DMA being disabled) is RAM refresh, which I understand takes 9 cycles per scan line. Factoring this into the calculation seemed to yield a result closer to what's expected: i.e. ($9B * 2) * 9 added to the overall cycle count in PAL mode. The figures soon drift wildly off when the CPU is a 65C816, however. I'm not sure if this is down to my misunderstanding the cycle counts of 6502 instructions in emulation mode, or not properly accounting for RAM refresh cycles.

I notice that SysInfo computes the CPU speed quite accurately, so are there any special compensations required when trying to work out the speed of a 65C816 CPU running in emulation mode on an A8? Or is there a better way of measuring speed entirely?

On the 6502, the counter is bumped 1,255 times. If we assume 25 cycles for the whole block (branch always taken in all except two cases, no page crossing), then that's 1,255 * 25 = 31,375. Nine refresh cycles per scan line (PAL) = 310 * 9 = 2,790.

31,375 + 2,790 = 34,165.

34,165 * 50 = 1,708,250.

Not quite 1.79MHz, so already we have some unaccounted for cycles (assuming I added things up right).

We get the same number of iterations on a 65C816 @ 1.79MHz.

65C816 @ 7.14MHz:

4,443 iterations. 4,443 * 25 = 110,075. Add 2,790 (RAM refresh) = 113,865. Multiply by fifty frames and we get 5,693,250, which is just way, way off what we're aiming at. It's possible the inaccuracy at 1.79MHz is just being amplified the faster the CPU speed, but I'm a bit stuck as to how to fix it up.

What are you looking for - CPU clock speed or execution speed? Regardless of the clock speed, some cycles are not available for executing instructions. So, even at 1.79Mhz, where all clock cycles are the same, you have a significant reduction in possible execution speed.

In a 65816 that is running higher speed clocks, you have to halt the CPU and align the clocks to the 1.79Mhz hardware before you do things like access ANTIC, GTIA, REFRESH, and such. How many cycles are 'wasted' for syncing to the system clock is not deterministic - it may occur anywhere in the sequence. (well, a strong statement that may not be correct, but I certainly wouldn't want to try and figure it out)

I would say that you should leave all the processes running and count how many times you can execute a simple routine, if you want execution speed. The results will vary all over the place, depending on the graphics mode and such, but that's the reality of it.

For clock speed, I would turn everything off and execute a timing routine after REFRESH is finished for the frame. This will be tricky since you cannot access things like VCOUNT hardware or set interrupts for anything in the middle of the routine.

I might drop Konrad an email and see if he fancies sharing his approach to the problem. In any case, on-paper clock speed is what I'm really looking for, but if that can't be reliably attained, we'd have to settle for effective speed. What's reported by Konrad's SysInfo seems acceptably accurate.

Obviously DMA's off and all interrupts are disabled, so hopefully the only variance we're left with is refresh and any clock syncing for 65C816. The test code is run in RAM, so on the face of it, basic conditions are the same as those under which SysInfo runs.

At 7 MHz the 65C816 takes more than 4 cycles to LDA VCOUNT because it's reading from the slower ANTIC, no? Is the same true for RAM access or is there faster local RAM?

I can only answer for the XL14 hardware. LDA VCOUNT will take 70ns + 70ns + 70ns + SYNC + 560ns. SYNC being the time required to align the fast clock with the slow clock - up to 490ns (at 14.32Mhz). The typical mode is to fetch the opcode and operands from RAM at 14.32Mhz and the data from legacy hardware at 1.79Mhz.

One strategy may be to address VCOUNT at the beginning of the routine and then cycle-count from that point onward. This may work since executing a LDA VCOUNT (or any other hardware reg) always leaves you in sync. The second LDA VCOUNT will always take 210ns + 350ns + 560ns.