Yes, looks like on 32 bits GCC is being smarter with the floating point generated from C, compared to the Haskell stuff. While on x86_64 it does the same (better) thing.

]]>By: George Giorgidzehttps://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-73
Tue, 10 Mar 2009 16:28:42 +0000http://donsbot.wordpress.com/?p=99#comment-73If used with -fexcess-precision some of the fstp/fld (storing and loading of floating points) instructions are removed from the loop, but not all of them:

So, on my system C remains about twice as fast. For now, I am not able to make the Haskell version any faster. Any pointers in this direction will be very much appreciated. In particular, how can I get rid off the remaining fstp/fld instructions in the Haskell version.

I should also note that Don uses x86_64 architecture and I am using i686 (though my CPU, see above, has 64 bit support). I am not sure but this might be the main reason of different results. I do not have access to x86_64 machine to confirm this doubt.

Maybe it is time to upgrade my Arch Linux to x86_64 version.

Anyway, thanks for your post and helpful comments on my struggles. I learned a lot from this example.

Cheers, George

]]>By: George Giorgidzehttps://donsbot.wordpress.com/2008/05/06/write-haskell-as-fast-as-c-exploiting-strictness-laziness-and-recursion/#comment-65
Mon, 09 Mar 2009 02:43:51 +0000http://donsbot.wordpress.com/?p=99#comment-65Using ghc-core I had a look at inner loop instructions (I am not expert in this stuff, I hope i got it right). I was looking for a division instruction and for the preceding loop.