Best speed switches for gcc

This is a discussion on Best speed switches for gcc within the Linux Programming forums, part of the Platform Specific Boards category; So far I've noticed that:
-O3 and -fomit-frame-pointer significantly speed up my proggies.
Are there other goodies I missed for ...

But I'm trying to milk an O(n^2) algo for all it's worth. The O(n^2) is written in stone & I know what the hot spot is, in this rare case I knew this before I started coding. Of course there's plenty of stuff a newbie like me isn't doing efficiently but I'm trying to coast as long as I can before I have to pedal hard.

Sooo... given that I'm working with an athlon I should reinstall gcc for my target processor? Giving it a better chance to optimize for the chip when I do the -O3? It should also allow it to use all those fancy MMX registers &c. Or is there a better way?

for the morbidly curious only

I know that program optimization can be fickle. But I'm trying to learn what I can about gcc switches & I figured I'd share my findings. It is odd to see how many of the switches slow things down, but of course they may do the reverse in other situations.

I was able to lose another second with -fdelete-null-pointer-checks & -fschedule-insns2. But i would be skeptical about using them in other situations & my testing shows little or no gain as a rule.

On the other hand, the -O3, -fomit-frame-pointer & -mcpu=whatever seem to be winners on all of the compute bound programs I tested. Still a small N tho & limited to integer math.

Easy-to-read source code optimizations (only in the problem area) have also been big winners.

-mcpu=cpu-type
This is identical to specifying both -march and -mtune.

So, your hypothesis is that -mtune could have a negative effect on speed? I didn't think of that & seeing how the other options can have a negative effect I'll give it a try.

Truthfully, my biggest surprise was that the option for pretouching the memory to load the cache line didn't have a net gain. This suggests to me that I'd better make sure that my malloc() structs are properly aligned, I thought that malloc() did that for me by default. This is of course a source code tweak.

Thanks for all the help. So much for me to learn so few brain cells to do it with.

A few more source code tweaks & the user time is now at 0m31.610s. I'm only posting this because those same tweaks with the -mcpu vs -march switch were actually *slowing down* the times. This was causing me much confusion because it should have been stuffing more data into the cache for the inner loop & speeding things up. Now I know that I wasn't giving the compiler the correct info. An important lesson.

> Well they will be data aligned for sure -
> meaning you can store a data type with the
> most restrictive type (usually a double) at
> the address returned to you.
Yup, my tests confirm that I was writing nonsense about malloc() alignments. Everything is aligned to the 8s.

>It might be worth looking at the -fbranch-probabilities option
Unfortunately, the -fbranch-probabilities actually slows the times down by almost exacly one second. Another switch that seems to be context sensitive. The problem area is a nested for loop so the switch seems to be to be a logical choice but I've hoisted about as much as I can out of it. It's pretty lean at this point.

FYI: I'm doing a variant on the old edit distance problem, AKA dynamic programming. This algo searches for a best score in a 2D array. Each cell's score depends upon its neighbors to the North, East, and Northeast. The matrix cell structs are down to 2 ints. A nearly ideal size for caching. With some other tricks I've been able to reduce the "matrix" to 2 rows (well there is one more for the 1st iteration) which I toggle between. And because I'm not using an actual matrix and the rows are a few hundred cells on average it tends to be quite cachable.

Hm, that gives me another idea... Anyway, with your help I'm learning a lot going thru this exercise. I'm going to have to pull the plug on this sooner or later but given that you've helped me take a 2 week task down to under one. I think that the time has been well spent so far & deserves a few more edits. Thanks again.