This was caused by not masking the values together before ORing them into the second constant. In this case the sign extended bits of >87 were stomping on >0D, resulting in a corrupt value. I'm posting the two-liner patch on the forum.

He also found another problem using -O0, which I usually avoid. The register I chose for the frame pointer, R8, is volatile. That means that it can be destroyed over a function call. This was just a dumb mistake. In order to preserve the ABI interface, I've moved the frame pointer to R9, which is preserved across function calls. The resulting code looks much safer now.

I was working on the EA5 converter today, and got it mostly working, but I've come to a design dilemma.

The conversion utility fills out a structure describing the layout of the data and bss sections. I can do the same thing in a linker script, and make the conversion simpler. However that adds complexity which is annoying, and potentially confusing for novice users.

As a related thought, I should look at the default linker script. It really should follow the requirements for EA5 files (.text at A000). I've got some thinking to do...

Saturday, July 23, 2011

This is what was posted to the AtariAge forums:

Update time!

It's about six months later than promised, but I haven't given up yet.

Most of that time has been putting in a ton of hours at work and beating on the GCC code to get byte operations working properly. What's in this release is the fourth or fifth overhaul of the port. In the end I had to rewrite core bits of how GCC relates byte and word quantities. I've kept those changes to a minimum, so ports to later versions should still work.

Here's what got changed in this patch:

Add optimization to remove redundant moves in int-to-char castsRemove invalid CB compare immediate mode.Add optimizations for byte immediate comparisonAdded optimizations for shift and cast forms like (byte)X=(int)X>>NRemove invalid compare immediate with memoryImproved support for subtract immediateFixed bug causing gibberish in assembly outputGCC now recognizes that bit shift operations set the comparison flagsFixed bug causing bytewise AND to operate on the wrong byteAdd optimization for loading byte arrays into memoryConfirmed that variadic functions work properly.Fixed the subtract instruction to handle constantsFixed the CI instruction, it was allowing memory operandsFixed a bug allowing the fake PC register to be used as a real registerEncourage memory-to-memory copies instead of mem-reg-mem copiesAdded optimization to eliminate INV-INV-SZC sequencesModify GCC's register allocation engine to handle TMS9900 byte valuesRemove the 32 fake 8-bit registers. GCC now uses 16 16-bit registersModify memory addressing to handle forms like @LABEL+CONSTANT(Rn)Clean up output assembly by vertically aligning operandsClean up output by combining constant expressionsOptimize left shift byte quantitiesFixed a bug where SZC used the wrong registerRemoved C instruction for "+=4" forms, AI is twice as fastAdded 32-bit negateFixed 32-bit subtractFixed a bug causing MUL to use the wrong registerFixed a bug allowing shifts to use shift counts in the wrong registerConfirmed that inline assembly works correctlyAdded optimization to convert "ANDI Rn, >00FF" to "SB Rn,Rn"Optimize compare-with-zero instructions by using a temp registerFixed a bug allowing *Rn and *Rn+ memory modes to be confusedRemoved most warnings from the build process

There were also changes made to binutils, I hope this will be the last update for this.

More meaningful error messages from the assemblerDATA and BYTE constructs with no value did not allocate spaceFix core dump in tms9900-objdump during disassembly

The ELF conversion utility was also updated to allow crt0 to properly set memory before the C code executes. If it finds a "_init_data" label in the ELF file, it will fill out a record with all the information crt0 needs to do the initialization.

In light of all these changes, I've made a new "hello world" program with lots of comments, a Makefile and all supporting files. I've also included the compiled .o, .elf, and converted cart image. In addition, there's also a hello.s file which is the assembly output from the compiler.

I'm not sure if I mentioned this earlier, but the tms9900-as assembler will accept TI-syntax assembly files, but there are a number of additions:

Added "or", "orb" aliases for "soc" and "socb" (that's been a gotcha for a several people here)Added "textz" directive - This appends a zero byte to the data. textz "1234" is equivalent to "byte >31, >32, >33, >34, 0"Added "ntext" directive - This prepends the byte count to the data. "ntext '1234'" is equivalent to "byte 4, >31, >32, >33, >34"Added "string" variants to all "text" directivesNo length limit for label namesNo limitation for constant calculations, all operations are allowed (xor, and, or, shifts, etc.)

It think thats about enough for now

I believe this is the biggest jump in usefulness yet. I've gone through and tested every instruction, and written several tests programs which did semi-interesting things from the compiler's point of view. They were, however, exceptionally dull from a user's point of view. For all the blow-by-blow details, check out my blog.

As a final test of the byte handling code, I built that chess program posted back in December. No problems were seen and no hinky-looking code was generated. In addition, it was about 5% smaller.

The build instructions are listed in post #43, and haven't changed since.

Friday, July 22, 2011

I thought I should write a "hello world" framework for anyone who wanted to use a working model as a starting point. That was apparently a good idea. I found a heck of a tricky bug to nail down. When performing a bytewise comparison with zero, the following instruction was generated:

mov *r2+, *r2+

This is incrementing the pointer twice. In this case, the comparison was used to find a null terminator for a string. The extra increment caused the terminator to be skipped, and the calculated length was nonsensical.

Fixed by using a temp register for the second argument. This is an optimization I was considering for a while. didn't fix 32-bit stuff yet.

This problem was also caused by not keeping a strict distinction between *Rn and *Rn+. If this isn't caught in instructions which use a repeated operand, like the one above, we will have some nasty side-effects.

Thursday, July 21, 2011

OK, I've gone through the source tree and removed all the dead files and reversed all whitespace changes to reduce the number of files which appear in the GCC patch.

I've also found a use for that SB instruction I've mentioned a few times. That will be used to replace an "ANDI Rn, >00FF" instruction. I don't think that constant will be used very often, but this was mostly done to make me feel better.

OK, I've gone through the source tree and removed all the dead files and reversed all whitespace changes to reduce the number of files which appear in the GCC patch.

I've also found a use for that SB instruction I've mentioned a few times. That will be used to replace an "ANDI Rn, >00FF" instruction. I don't think that constant will be used very often, but this was mostly done to make me feel better.

I figured this was a pretty good time to fix my crt0 to do data initialization and BSS clearing. So, the crt0 was updated correctly, the elf2cart utility looks like it's OK, but I found a problem with GAS. When a DATA directive is used with no data value, no space is actually allocated.

That's bad.

But also fixed.

So at this point, initial values for variables are used as expected, and the conversion utility works like a champ.

Saturday, July 16, 2011

I got 32-bit multiply working, but it's awfully big. Since we only have instruction for 16-bit multiply, we need to expand the math.

R*G = (R0*K+R1)*(G0*K+G1) = R0*G0*K*K + R0*G1*K + R1*G0*K + R1*G1

At least we can omit the R0*G0 term since it won't fit into 32 bits. We will need a 32-bit temp value T stored in registers, and a 16-bit temp value H which could be stored in memory. That leaves us with this code:

Wednesday, July 13, 2011

I'm pretty confident about the 8 and 16 bit instructions, so I guess I can't put off checking the 32 bit instructions anymore. At least I can find out where the holes are, and which instructions should be moved to an external library.

to test:shift rightshift leftlogical shift rightmultiplydividemodulus

The shifts generate functional, if not efficient, code. I think I can get multiply without too much trouble, but divide and mudulus will have to be in an external library. I wrote the divide code earlier, and it requires a function of 25 instructions with lots of loops. I don't see how allowing that to be inlined would be a good idea.

Sunday, July 10, 2011

I found and fixed a stack problem in tms9900-objdump. The buffer allocated for the source address in the COC instruction was too short. Attempting to display an instruction like "coc @>1234(r5), r0" caused a stack overflow, crashing objdump. Fixed now. I hope that's the last binutils bug, but I've got a feeling more are still lurking in there.

Wednesday, July 6, 2011

After my experience with word shifts, I decided to go back to byte shifts. The idea was to translate that effort and move on. Unfortunately, I noticed that GCC wants to do shifts as word quantities. I'm not sure why that is, but unfortunately, something like "(char)v>>=4" turns into: sra r1, 8 * Convert from byte to word value sra r1, 4 * Shift right the indicated amount swpb r1 * Convert back to byte value

Better, but not by much. I can add a peephole to change this sequence to a single shift, but I'll be back to this problem if the code is "(char)v = (v+1)>>4". The intervening add will defeat the peephole, bringing back the unnecessary byte-to-word and word-to-byte instructions.

From what I've seen so far, it looks like these promotions are expected behavior, and changing that would require getting into the guts of GCC again. I think I'll pass for now.

Saturday, July 2, 2011

Apparently, the constraints for the shift count register was too loose, resulting in unexpected registers being used in the IRA step, and later flagged as errors. I was seeing things like "srl r1, r2" which is super wrong.

This was fixed by rewriting all the shift instructions to use "define_expand" to copy the shift count into R0, then do the shift. Two instruction forms were then written, one which only acccepts R0 as the shift register, and then another which only accepts contants. The optimizer then eliminates the unneeded move when constant shifts are used. I'm awfully happy about how that works now.

Even though I don't have a 32-bit shift yet, GCC is happy to compose a sequence using 16-bit instructions. Unfortunately, that sequence is pretty big. "long shift_ar(long r, int n) {return(r>>n);}" gets converted to: