Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.

You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.

Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.

You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.

i know for gcc. thanks anyway...

I did not know that there was a problem with compatibility with gcc. All the codes I tested work ...

There is no general rule for the speed difference between 32- and 64-bit code.- 64-bit code can be faster if the extra registers play a role- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses- both 32- and 64-bit code can use SIMD instructions.

In the Masm32 forum, we made lots of benchmarks. Most of the time, the differences are small. Libraries written in optimised assembly are generally faster than C code, often by a factor 2-3. There are cases, however, where C code is not slower - simply because the C compiler generates the same code that a human assembly programmer would choose.

Some additions, note that I'm talking from a compiler standpoint, with at least partial assembler runtime helpers, and for x86/x86_64 only, not necessarily 64-bit universal. (since I have a Raspberry pi 3, I have ARM64 too, and there is a G5 lurking around somewhere)

jj2007 wrote:There is no general rule for the speed difference between 32- and 64-bit code.- 64-bit code can be faster if the extra registers play a role- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses- both 32- and 64-bit code can use SIMD instructions.

- 64-bit SIMD has ABI support (volatile registers, aligned stack) and twice the number of registers, floating point passed in SIMD registers. This (aside for the #registers) can be emulated by compilers, but only for the current program. For x86_64 it is systemwide, so also for calls into the system, 3rd party dlls etc. - SIMD floating point is generally is faster for simple operations, but slower (than x87) for complex operations. (not counting vectorization, since that is relatively rare)- (Unix) position independent code is cheaper on x86_64- since structures with pointers become larger, there are data cache effects to 64-bit too, though usually only noticable in special cases. (microbenchmarks)- A 64-bit runtime can assume SSE2/3 as minimum, so usually more routines are optimized that way. In general, the minimal CPU level is higher.

@Coolmanit is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphicsI don't have a real world benchmark, so here's the nbody benchmarkmy times on Windows 10 x64

Private Function main(Byval argc As Long) As Long Dim n As Long = argc Dim i As Long offset_momentum(5, bodies()) Print Using "##.#########"; energy(NBODIES, bodies()) For i = 1 To n advance(NBODIES, bodies(), 0.01) Next Print Using "##.#########"; energy(NBODIES, bodies()) Return 0End Function

There is no limitation to the use of SIMD in 32-bit land. In my main library, there are over 500 lines containing the string "xmm". But of course, there are some old compilers around that have no SIMD support.

So that is FPU for 32-bit code, SIMD for 64-bit code. The latter is faster but also much less precise. And no CPU would complain if it was fed the SIMD code in 32-bit mode. So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.

srvaldez wrote:@Coolmanit is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphicsI don't have a real world benchmark, so here's the nbody benchmarkmy times on Windows 10 x64

FBwin32 fbc -w all -asm intel -gen gaslaunched four times. the result is not constant

-0.169075164-0.169059907elapsed time 23.55617942135427 seconds

-0.169075164-0.169059907elapsed time 27.06140585355911 seconds

-0.169075164-0.169059907elapsed time 27.09792673439802 seconds

-0.169075164-0.169059907elapsed time 26.94121167771459 seconds

FBwin32 fbc -w all -asm intel -gen gcc -Wc -O2launched four times. the result is not constant

-0.169075164-0.169059907elapsed time 10.57177684700469 seconds

-0.169075164-0.169059907elapsed time 12.42956154142667 seconds

-0.169075164-0.169059907elapsed time 13.75664050981891 seconds

-0.169075164-0.169059907elapsed time 13.70927023998706 seconds

FBwin64 fbc -w all -asm intel -gen gcc -Wc -O2 launched four times. the result is not constant

-0.169075164-0.169059907elapsed time 10.85466894134879 seconds

-0.169075164-0.169059907elapsed time 12.73996500298381 seconds

-0.169075164-0.169059907elapsed time 12.79951698612422 seconds

-0.169075164-0.169059907elapsed time 12.76374577032402 seconds

thank you for the example.the results are quite similar with a small advantage for the 64 bit version. I expected better...very interesting.

So I took your benchmark and had a look under the hood: Voilà, mystery solved, x64 is 10% faster because of one single faster instruction, sqrtsd vs fsqrt. As soon as you comment out that single instruction, the 32-bit version becomes 26% faster than the 64-bit version.

Benchmarks should be much more balanced, representing a variety of typical tasks like loops, string processing, conversions, sorting, searching, integer and float math, graphics, etc.