I have made some comparing tests... (Almost) same code, compiled with fpc and gpc, gives very different results (both running speed and exec size)... Here (http://itaprogaming.free.fr/download/fpc_vs_gpc.zip) you can find a small demo compiled with fpc, gpc and gcc, to take a look at the differences.
I'm a bit confused... my question is: should we stay with FPC, hoping in a better speed in future, or should we switch to GPC?
:?:

dmantione

12-01-2006, 02:50 PM

Please post the source, so I can see what goes wrong. I'm not really known with the ARM stuff, but according to Florian FPC produces better code than GCC on ARM.

What I can see from the executables is that the FPC executable contains RTTI, which most propably means the FPC executable was compiled without smartlinking, which would explain the large executable.

Legolas

12-01-2006, 06:33 PM

[quote="dmantione"]
What I can see from the executables is that the FPC executable contains RTTI, which most propably means the FPC executable was compiled without smart]

Yes, that's true. BTW, do this mean that smartlinking now works fine in FPC?

The reason the compiler generates such huge code is that on the ARM you cannot access variables directly and the compiler has to build a pointer to the variable first before it can access it. However, with register variables enabled the result looks quite good:

Since GPC uses register variables by default this could be one of the causes of the difference. I haven't checked what GPC generates, so it is a bit of guesswork.

The code has opportunities for global optimizations. For example, a compiler that can do indunction variables converts the array index to pointers and can save screen address calculation each iteration. I don't think GPC does this, because I never saw GCC using an induction variable. FPC cannot do induction variables either, but Delphi can for example.

However, the most likely cause for the slowdown can be found if we look at the loop body. FPC's code generation for the loop body looks very reasonable:

...however, FPC calls a helper to calculate the modulo. Look at the arm.inc file in the rtl source code (rtl/arm/arm.inc). There is no fpc_mod_longint here.

So, the actual fpc_mod_longint used is in rtl/inc/generic.inc. A 100% Pascal routine that calculates the modulo cpu independently, most likely it ain't very effficient.

So, somebody needs to code a fast assembler version of fpc_mod_longint for the ARM and most likely the problem will be solved.

[quote="Legolas"]
The only big difference is that in fpc I can't figure a suitabe way to declare 'absolute' VideoBuffer.

That's simple to solve: Free Pascal can do absolute, but it is only enabled for Dos. We need to enable it for the GBA as well.

Legolas

12-01-2006, 08:50 PM

You should get a nomination for the "Best answer ever" Oscar on this forum :mrgreen:
Seriously, now I can figure what happens. With register variables enabled I get similar results, so the only problem should be the mod function. I *badly* need to learn some assembly :read:

Legolas

12-01-2006, 11:18 PM

Well... The problem was really mod function :D
I have found that gba bios embeds some math functions; among these, a mod function. This is my implementation:

You can even try to inline it, since the code is so short that a procedure call is already overhead. I don't know if inlining this procedure will actually work (the compiler might assume that it should be able to call the helper), you woud have to check that.

Legolas

13-01-2006, 01:10 PM

Uhm... ok! Even stripping out "bx lr" it works. :) This code comes from libgba (that is a part of devkitpro)... I have simply put it inside an asm-end block, so I don't really know how'n'why it works (maybe magic?) :lol:
There are a bunch of other bios functions too, so I'll try to put it in system.pp

dmantione

13-01-2006, 02:20 PM

Yes, you should leave out the bx lr, because the compiler automatically adds code to clean up the stack frame and the variables and return to the caller after the last instruction. It is recommended not to return yourself, otherwise you might loose stack memory.

Legolas

14-01-2006, 02:46 PM

You can add this code to system.pas for the gba, put this in system.pp:

I'm trying to modify linux rtl. Maybe I should start a rtl porting from scratch :?

dmantione

14-01-2006, 03:01 PM

The magic here is the {$DEFINE FPC_SYSTEM_HAS_MOD_LONGINT}, which instructs the system unit not to include the default fpc_mod_longint. It must be defined before generic.inc gets procesed; it might help moving this define up to the top of the file.

Legolas

14-01-2006, 03:10 PM

The magic here is the {$DEFINE FPC_SYSTEM_HAS_MOD_LONGINT}, which instructs the system unit not to include the default fpc_mod_longint. It must be defined before generic.inc gets procesed; it might help moving this define up to the top of the file.

And - of course - this way it works... :D
I was losing myself in tons of include :doh:

It is impressive how quick you answer my questions... Next time I'm thinking that your reply will arrive even before my (stupid) question. :lol:
Thanks :wink:

Legolas

15-01-2006, 12:34 PM

Next chapter! :D
Now previous code returns a black screen. The asm file looks good, because it calls fpc_mod_longint, but seems that something goes wrong in the rtl.
Another question: I have seen that in generic functions there is fpc_mod_longint and fpc_mod_qword. The gba bios has only a function for mod... It is good/enough to replace fpc_mod_qword with same code for longint too?
BTW, I have tryied to compile rtl for smartlinking and it works pretty fine ^_^

dmantione

15-01-2006, 09:07 PM

Hmmm... That is bad.... :? If the screen stays black that most likely means that mod doesn't work at all.

There are two possibilities:
* The return value of fpc_mod_longint gets lost somehow. Perhaps we did something wrong with the "bx lr" somehow (I'm an apprentice at ARM assembler :think: )
* fpc_mod_longint overwrites a register that it isn't allowed to overwrite causing the loops to end prematurely or something.

We need to find out which situation is the case, otherwise ne need to have Florian a look at the assembler code, he's a bit more experienced here.

Regarding fpc_mod_qword, no, it should calculate the modulo between 64-bit unsigned numbers compared to the modulo between 32-bit signed numbers. Code designed for one calculation does not automagically work for the other... :(

What kind of exe size did you get with smartlinking?

Legolas

15-01-2006, 09:24 PM

There are two possibilities:
* The return value of fpc_mod_longint gets lost somehow. Perhaps we did something wrong with the "bx lr" somehow (I'm an apprentice at ARM assembler :think: )
* fpc_mod_longint overwrites a register that it isn't allowed to overwrite causing the loops to end prematurely or something.

Seems like the executable goes in a bad kind of loop, because I have noticed a loss of frame rate on the emulator (working executable runs at 100%, bad one runs at 70%).
[quote]
What kind of exe size did you get with smart]
Well, about 30/35 kb instead of 160. That's fine :D

dmantione

15-01-2006, 09:45 PM

Seems like the executable goes in a bad kind of loop, because I have noticed a loss of frame rate on the emulator (working executable runs at 100%, bad one runs at 70%).

That points into the direction of the second explanations, which is a bit what I was afraid of. Assuming the GBA bios does not destroy registers other than input and output, you should check if the compiler has data stored in r1 before it calls fpc_mod_longint, since r1 is destroyed by your implementation of fpc_mod_longint.

[quote]
What kind of exe size did you get with smart]
Well, about 30/35 kb instead of 160. That's fine :D

Yes, but it should be possible to do better. It might be an idea to check what kind of code is called by the system.pp unit initialization and kick some cruft out. But on the other hand, it's not a big priority at the moment, fast math is much more important.

Legolas

16-01-2006, 12:43 PM

Seems like the executable goes in a bad kind of loop, because I have noticed a loss of frame rate on the emulator (working executable runs at 100%, bad one runs at 70%).

That points into the direction of the second explanations, which is a bit what I was afraid of. Assuming the GBA bios does not destroy registers other than input and output, you should check if the compiler has data stored in r1 before it calls fpc_mod_longint, since r1 is destroyed by your implementation of fpc_mod_longint.

Urgh!!! That hurts... I have found some ASM tutorials for ARM... Maybe this is the time to start reading them :)

[quote]
What kind of exe size did you get with smart]
Well, about 30/35 kb instead of 160. That's fine :D

Yes, but it should be possible to do better. It might be an idea to check what kind of code is called by the system.pp unit initialization and kick some cruft out. But on the other hand, it's not a big priority at the moment, fast math is much more important.

I have used linux rtl and I havent removed alot of lines, indeed.

Legolas

16-01-2006, 08:02 PM

Assuming the GBA bios does not destroy registers other than input and output, you should check if the compiler has data stored in r1 before it calls fpc_mod_longint, since r1 is destroyed by your implementation of fpc_mod_longint.

Uhm... well, according wiht your suggestions I have tryied to save r1 value:

mov r8, r1
swi #0x060000
mov r0, r1
mov r1, r8

I don't really know if this is a suitable way to save and restore r1 value, however it does not work at all :)

Looking at this doc (http://community.freepascal.org:10000/docs-html/prog/progse12.html#x122-1210003.4) I have tryied to tell to fpc compiler which register is affected by fpc_mod_longint function:

Not really. Push and pop are allowed only in thumb mode, while fpc handles only arm mode. I have tryied to translate it in a couple of store and load calls, but does not works too.

dmantione

17-01-2006, 07:28 PM

The save into r8 should be a proper save, unless the compiler expects you to save r8 which you destroy. Hmmm.... I'm going need take a look at the code myself, you can send me the code if you wish so... If no time tomorrow though.

In the meantime, please compare a version with the "mod" defined as function in the program and when it is in the RTL. Can you see differences?

Legolas

17-01-2006, 09:31 PM

The save into r8 should be a proper save, unless the compiler expects you to save r8 which you destroy. Hmmm.... I'm going need take a look at the code myself, you can send me the code if you wish so... If no time tomorrow though.

In the meantime, please compare a version with the "mod" defined as function in the program and when it is in the RTL. Can you see differences?

Looking at the compiler generated .s files I can find some small differences, mainly dues to different implementation, I think.
This comes from function version:

The code is tacken from an example on Mr. Harbour's book (http://www.jharbour.com/gameboy/default.aspx). The interesting thing is that now fpc executable runs faster than the gcc one :o

BTW, I have found a nice trick for swapping two registers without involving a third one:

eor r0, r0, r1
eor r1, r1, r0
eor r0, r0, r1
:D

Legolas

20-01-2006, 01:26 PM

Work goes on... I have discovered why the 2 registers for mod are swapped: in thumb mode it should be used SWI 6 (r0->number, r1->denom); in arm mode SWI 7 (r1->number, r0->denom). So, no need to swap registers... I only have mistaken SWI :oops:

BTW, now I have some problems with asm 'dialect' in fpc. Seems like it is not so much standard compliant. For example, it does not understand asm comments (@, ;) but pascal ones (//); labels should always start with .L; in some cases, asm code that works in gas is not understood by fpc:

mov r0, #0x4000006

generates an "invalid constant" error;

mov r0, r0, lsl #0x10

generates an obscure "internal error 200501051"; elsewhere 'lsl' is an invalid opcode. My asm skills are near to 0, so probably I'm doing something wrong in the code. :?:

dmantione

21-01-2006, 12:24 PM

This is really a question for Florian. The second one is definately a bug, so feel free to submit one. The first one I don't know, perhaps try to use Pascal syntax, $ instead of ox? Anyway, please ask Florian what to do here.

Legolas

25-01-2006, 01:45 PM

mov r0, #0x4000006

generates an "invalid constant" error;

Oh, I respond by myself: ARM does not allow to load any value in a register, but only (8 bit value) << (x*2) values, according with this faq (http://devrs.com/gba/files/gbadevfaqs.php#Shift8bConst).
About the other question, I have submitted a bug report, so I'm waiting for a fix.

In the meanwhile I'm trying to make some nicer-to-look demos, hoping that this can attract more people joining fpc4gba project... :P

dmantione

26-01-2006, 07:27 AM

Just a suggestion, you're posting news on your own site. There is nothing wrong with that, but the FPC4GBA url is a little more published. It is there where you need to show the world your progress.

Legolas

26-01-2006, 08:26 AM

Just a suggestion, you're posting news on your own site. There is nothing wrong with that, but the FPC4GBA url is a little more published. It is there where you need to show the world your progress.

I know, but fpc4gba main site is owned by WILL. I don't have access to it... :)

savage

26-01-2006, 12:50 PM

I'll have a word with WILL when he gets back so that when you post a news item, it gets replicated on fpc4gba, your site and the pgd news. I'm sure PHP could be used to automate all that. Then it would less hassle for all concerned.

Legolas

26-01-2006, 02:06 PM

I'll have a word with WILL when he gets back so that when you post a news item, it gets replicated on fpc4gba, your site and the pgd news. I'm sure PHP could be used to automate all that. Then it would less hassle for all concerned.

Ok, good news! :D
Let me know if I can help you and WILL in some way :rambo: