I noticed via Twitter that byuu acquired additional upd7725 documentation and changed the implementation of the overflow flags in higan. The upd7725 overflow flags are something I had puzzled over and discussed with Lord Nightmare once (quite a long time ago, probably when I was initially backporting the DSP LLE into bsnes-classic) because I'd noticed that he'd changed the way MAME calculated the flags from how bsnes did it. Neither the original bsnes implementation nor LN's modified MAME implementation looked quite right to me, but although I understood how the flags were meant to be used, I couldn't figure out how to calculate them so that they could be used in that way.

Thanks to the new documentation, specifically the explanation of how the S1 flag is calculated and the "1, 0, 1" (overflow, no overflow, overflow) case, I think I've figured out how everything works, and it's a good bit simpler than byuu's new implementation. In particular, I believe that there is no need for a flag "history buffer" and that the chip contains no such thing.

First of all, we have to understand what the OV0 and OV1 flags mean arithmetically. Basically, whereas OV0 indicates whether the most recent operation produced a signed overflow, OV1 indicates whether the value in the accumulator is in bounds (between -32768 and +32767) or whether it is overflowed. Let's think about how to calculate that, and build up a truth table.

First, the easy cases. If the accumulator previously contained an in-bounds value, and no overflow occurred in the last operation, then the accumulator must still contain an in-bounds value. Likewise, if the accumulator previously contained an in-bounds value and an overflow occurred, the value in the accumulator is now out of bounds.

Code:

OV1in OV0 | OV1out-----------+-------- 0 0 | 0 0 1 | 1

Next, if the accumulator was previously out of bounds, and no overflow occurred in the last operation, then the accumulator is still out of bounds. This is perhaps not quite as easy to intuit as the first two cases, but think about it: the only way the accumulator can go from out of bounds back to in bounds is if a second overflow occurs, in the opposite direction of the original overflow.

Code:

OV1in OV0 | OV1out-----------+-------- 1 0 | 1

Finally, the hard case: what happens if the accumulator was out of bounds and another overflow occurs? Let's look at a couple of examples:

Code:

32767 + 1 + (-2) (hex: $7FFF + $0001 + $FFFE)

$7FFF + $0001 = $8000 + overflow$8000 + $FFFE = $7FFE + overflow

Adding $7FFF to $0001 gives a result of $8000 with an overflow (positive + positive = negative). Adding $FFFE to the result gives a result of $7FFE and a second overflow (negative + negative = positive). However, despite the two overflows, the final result is correct and in bounds: 32767 + 1 + (-2) equals 32766. The two overflows have cancelled each other out. Now let's look at another example:

On the first addition, an overflow occurs (positive + positive = negative). No overflow occurs on the second addition, but on the third addition another positive + positive = negative overflow occurs. This time, the final result is not in bounds: 32767 + 32767 + 32767 + 32767 isn't -4 or even 65532, it's 131068 (hex $1FFFC).

The difference between these two cases is that in the first case the two overflows were in opposite directions, and in the second case both overflows were in the same direction. The purpose of the S1 flag is to distinguish between these two cases. According to the datasheet, the S1 flag contains the sign of the result of the last operation that took place with the incoming OV1 flag clear; in other words, the last operation that took place while the incoming accumulator was in bounds. If the S1 flag is the same as the S0 flag produced by the current overflowing operation, then two overflows in the same direction have occurred and the accumulator is still out of bounds. If the S1 flag and the S0 flag are different, then two overflows in opposite directions have occurred, meaning the accumulator went out of bounds and then back in bounds.

byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.

The datasheet implies that "overflow, no overflow, overflow" is some kind of special case that the chip explicitly checks for, but in fact it's just a consequence of the math. Two consecutive operations can't both overflow in the same direction; just look at the results from adding $7FFF (the largest possible positive number) to itself. If you work out the results of repeatedly adding $8000 (the smallest possible negative number) to itself, it's the same. You can only have two overflows in the same direction if there is at least one non-overflowing operation between them.

Note that in order to make use of this overflow mechanism, it is essential that the OV1 flag be cleared before you start doing your additions, or the S1 flag won't be updated when it should be. According to the datasheet, any ALU operation other than an addition or subtraction clears the OV1 flag, and if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.

One more thing: why is this overflow mechanism only good for three operations? Let's look at what happens if you do four additions in a row with the following values:

We've already gone over the first three additions, so just look at what happens with the fourth. An overflow occurs (OV0 = 1) and S1 and S0 have opposite values, so the OV1 flag is cleared. Which means the accumulator is considered to be in bounds. But this is wrong--32767 + 32767 + 32767 + 32767 + -32768 is 98300, not 32764! If you do four additions in a row, it becomes possible for two overflows in one direction to occur followed by an overflow in the opposite direction, resulting in a false negative.

Here's a python program that implements the overflow flags according to the previous post and verifies that for a representative range of input values they produce arithmetically correct results for three additions (OV1 is set if and only if the final result is out of bounds) but that they don't produce correct results for four additions.

... wow, hat tip to you! This also solves a third question of mine, which was whether the three OV history values were cleared on other ALU operations (strongly leaning toward yes.) If they don't exist, then the question is void.

However, I do want to say ... there's no absolute guarantee the original designers of this chip were so clever, short of studying die scans. It's quite possible they implemented this with two extra boolean latches. The implementation details were simply beyond the scope of the programmer's manuals. And though I'll definitely add this, it does lose a bit of clarity in the process. Not that my article's code example was all that clear to begin with, but still. If there were ever a time for source code comments, this would be it.

By the way, my DSP LLE had a nasty flaw with SGN:

Code:

case 7: idb = 0x8000 - flags.a.s1; break;

I'm not sure why I chose to hard-code this to OVA1. Should be:

Code:

case 7: idb = 0x8000 - (!asl ? flags.a.s1 : flags.b.s1); break;

=> mov a,sgn=> mov b,sgn

Quote:

if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.

So I did hear that on the SNES coprocessors, only the DSP1 uses SGN once. Does it actually do anything significant with the result where proper emulation of S1 would make an observable difference? If you're not sure, no need to look into it. I've been operating under the assumption that proper S1/OV1 support was mostly busywork, but ... perfectionism and all, finally got around to it thanks to much help from Cydrak.

Quote:

Thanks to the new documentation

If you'd like, send me your e-mail or let me know if you'd rather a mega link, and I can get the new documents to you. They're very low-quality scans, but they're quite thorough and explain a lot of the operations the SNES lacks in more detail: serial transfers, interrupts, etc.

Probably not very useful unless you're also a big fan of the prose2k hardware.

Quote:

The upd7725 overflow flags are something I had puzzled over and discussed with Lord Nightmare once

Heh, he went after you too, huh? ^-^I'm sure he will be absolutely thrilled at your findings here :)

All of the docs came from https://web.archive.org/web/20170313202 ... -dsps/nec/I ended up hand-feeding that entire directory of the site to archive.org manually page by page, because the site is hosted on a toaster or something and will go down hard for a few days if you download more than a dozen MB of data or so.

LN

_________________"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"

Actually, that's apparently not a flaw in your emulation. The upd7725 docs indicate that SGN is only affected by operations on accumulator A and not accumulator B. The upd7720 datasheet says that SGN is affected by either S1 flag, but the 1984-07 memo by Ted Knowlton points out that this is an error in the datasheet.

... which is the one I was reading for these new updates. God dammit >_>

Okay, I'm definitely adding a source code comment on this one. Good catch.

> this is an error in the datasheet.

... as well as in the processor. Quite the hardware design oversight. Ruins using the three chained add/sub operations and then saturating the result if you use the B accumulator. Have to use J(N)SB1 now.

... as well as in the processor. Quite the hardware design oversight. Ruins using the three chained add/sub operations and then saturating the result if you use the B accumulator. Have to use J(N)SB1 now.

Based on Ted's note on that site, I don't believe it affects the upd7720 itself, it is just a datasheet error, which was corrected on the upd7725 datasheet.

LN

_________________"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"

byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.

Well, on that note ...

Code:

if(!ov1) s1 = s0;ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;

Here, we can see the s0==s1 test doesn't get hit if ov1==0.

But what if we reverse this?

Code:

ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;if(!ov1) s1 = s0;

Let's say at the start, ov0 was set (from this ALU operation), but ov1 was clear (from the previous ALU operation.) ov1 will be set to ov0|ov1, or 1. And now the if(!ov1)s1=s0; test will fail, whereas if we did the s1 test before the ov1 assignment, it would have transferred s0 into s1. ov1 will be set correctly either way, but the order of operations will affect the s1 output when s0!=s1.

It seems pretty clear (the manual basically says as much), and your truth table seems to confirm, we should do the if(!ov1) s1=s0; test first, but it's always good to clarify these things in documentation. Flag assignments are not usually dependent upon the results of other flag assignments in CPU emulators.

byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.

Well, on that note ...

Code:

if(!ov1) s1 = s0;ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;

Here, we can see the s0==s1 test doesn't get hit if ov1==0.

But what if we reverse this?

Code:

ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;if(!ov1) s1 = s0;

Let's say at the start, ov0 was set (from this ALU operation), but ov1 was clear (from the previous ALU operation.) ov1 will be set to ov0|ov1, or 1. And now the if(!ov1)s1=s0; test will fail, whereas if we did the s1 test before the ov1 assignment, it would have transferred s0 into s1. ov1 will be set correctly either way, but the order of operations will affect the s1 output when s0!=s1.

It seems pretty clear (the manual basically says as much), and your truth table seems to confirm, we should do the if(!ov1) s1=s0; test first, but it's always good to clarify these things in documentation. Flag assignments are not usually dependent upon the results of other flag assignments in CPU emulators.

I said that it doesn't matter whether you use the new or old S1. It does matter whether you use the new or old OV1, which is why both truth tables specify "OV1in".

Quote:

I meant the CPU should honor the ASL bit to select between SA1 and SB1. Seems like an oversight in the design.

It seems that accumulator A is meant to be used as the primary accumulator, and accumulator B to hold either temporary values or the low 16 bits of a 32-bit calculation (notice that each accumulator uses the opposite one's carry flag as its incoming carry)

If you based your notes off my initial implementation, then my apologies.Those notes were based off the only uPD7725 manual I had at the time, which did not explain S1/OV1 nearly as well as the newly discovered uPD7720 documents do.

There are tons of uPD77C2xxx scans on datasheetarchive.com.I have uPD77C20 and uPD77C25 datasheets on my harddisk... downloaded back in March 2011, going by the file timestamps.My description/code is different than AWJ's table/code, I don't know which is closer to real hardware.

PS. I think my implemention might have opposite S1 values (ie. 0=positive vs 1=positive), this works as long as SGN opcode is processing S1 accordingly.

Who is online

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum