profilling help

I am playing with some optimization options for c++. I just change all loops in
the program to float and added some extra tests on the loop limits to reduce
the work and while the Intel compiler drops the runtime to a third (with full
optimization) gcc actually gets me triple the runtime for some reason.
I tried compiling with profiling turned on (-pg and gprof) and again, intel's
compiler gives me runtime and call count information for all functions but gcc
only gets me the (wrong) runtime for main and nothing else.
How do I profile this program to figure out where the compiler is going wrong?
This is with gcc 4.3 from unstable.
Thanks