$\begingroup$I'm voting to close this question as a duplicate of a question on a different SE.$\endgroup$
– Bill BarthSep 28 '15 at 22:27

$\begingroup$@BillBarth: Closing a question as a duplicate generally only works within a given StackExchange site, with the exception of obvious cross-posts. See meta.stackexchange.com/questions/172307/… on Meta StackExchange for discussion and possible solutions.$\endgroup$
– Geoff OxberrySep 29 '15 at 0:20

$\begingroup$@GeoffOxberry, I saw that and tried a plan B.$\endgroup$
– Bill BarthSep 29 '15 at 0:23

$\begingroup$@BillBarth where do you think I could find an elaborate answer ?$\endgroup$
– kesariSep 29 '15 at 1:35

2 Answers
2

There is a canonical answer to this question in GCC's wiki, which is presumably maintained, it is by far the most authoritative source of information for this type of question. This question, on the other hand, may eventually go out of date.
This is all explained in more detail in the wiki, with examples. The following is essentially a quote from it to illustrate how it answers this exact question, with minor comments:

-fno-signaling-nans

-fno-trapping-math

IEEE standard recommends that implementations allow trap handlers to handle exceptions like divide by zero and overflow. This flags assumes that no use-visible trap will happen.

-funsafe-math-optimizations - These optimizations break the laws of floating-point arithmetic and may replace them with the laws of ordinary infinite-precision arithmetic:

Due to roundoff errors the associative law of algebra do not necessary hold for floating point numbers and thus expressions like (x + y) + z are not necessary equal to x + (y + z).

-ffinite-math-only - Special quantities such as inf or nan are assumed never to appear, this saves the time spent looking for them and handling them appropriately. For example, should $x - x$ always be equal to $0.0$?

-fno-errno-math

disables setting of the errno variable as required by C89/C99 on calling math library routines. For Fortran this is the default.

-fcx-limited-range

causes the range reduction step to be omitted when performing complex division. This uses $a / b = ((ar*br + ai*bi)/t) + i((ai*br - ar*bi)/t)$ with $t = br*br + bi*bi$ and might not work well on arbitrary ranges of the inputs.

-fno-rounding-math

-fno-signed-zeros

Due to roundoff errors the associative law of algebra do not necessary hold for floating point numbers and thus expressions like (x + y) + z are not necessary equal to x + (y + z).

Strictly speaking the implications of the last two are not always as intuitive as one might think. For example (see wiki), what about $ -(a - a) = a - a$, is it $+0.0$ or $-0.0$? I believe there is a fair amount of literature on the exact implications, especially by Bill Kahan.

Not directly mentioned (I don't see it?), but with -ffast-math, certain common special functions such as the reciprocal $1/x$ and the square root $\sqrt{x}$ are replaced with less precise versions that are faster, but which still have some "tolerable" error levels (versus 0ulp error required by the standard) - here, for example, is what precision is usually provided by glibc's libm. Indeed, this is the most common cause of speedup from -ffast-math, in code that does a lot of arithmetic with divisions and square roots, almost to the point that I (personally) think the other suboptions (-ffinite-math-only and the like especially - signalling NaNs are quite useful for debugging) cause a bit too much hassle in terms of their cost/benefit.

I saw that the time taken for a simple $O(n^2)$ algorithm being reduced to that of an $O(n)$ algorithm using the option.

I believe this is improbable and it is possible you made a mistake in your analysis. Unsafe floating-point optimizations might make individual expressions somewhat cheaper to evaluate by virtue of having a larger choice of optimizations. But the speedup should always be at most a constant factor. Is it possible you compared an $O(n^2)$ algorithm with an $O(n)$ for insufficiently large $n$?

An $n^2$ algorithm can be reduced to something that behaves $O(n)$ if, for example, if $n$ is known to the compiler, and is a multiple of the vector size for the vector instructions (if any) supported by the processor. If the compiler can see all of this, it can unroll an inner loop and use vector instructions to do the work. This may reduce the overall operations done to a mere handful and improve the performance substantially.

My read is that fast-math doesn't enable such optimization, but it could if they are implicitly enabled by the unsafe-math-optimizations due to associativity restrictions that are disabled therein.