Rule of thumb: do not assume that the result of floating-point operation is exact or repeatable, it's not always true.

See that example. Ignore all these volatile and other hacks before main, they're required to stop optimizer's mess for demonstration purposes (see below).

I believe the following happens:

double is typically a 64-bit double-precision IEEE 754 number, which has 1 bit for sign, 11 bits for exponent and 52 bits for mantissa.

302500001100000001 is 100 0011 0010 1011 0010 0010 0101 1111 0110 0001 1110 1110 1011 0000 0001 in binary, which is 59 bits length. Leading one is not stored for normalized numbers in IEEE 754, so only leading 53 bits can be stored, which results in trailing 6 bits (00 0001) to be rounded to 00 0000 because of not enough precision. That is, your number becomes 302500001100000000 when converted to double, see the first line: x: 302500001100000000.000000000000

However, GCC's long double is typically a full 80-bit x86 extended precision floating point number. It has 64 bits for mantissa which is enough to store your number, see the second line: y: 302500001100000001.000000000000.

Now let's take a look at sqrt's signature. It takes double as the argument and returns another double. So, its input is always slightly less than you expected. However, it still prints "correct" answer: sqrt(x): 550000001.000000000000, sqrt(y): 550000001.000000000000. It's also correct when converted to long long: to_ll(sqrt(x)): 550000001, to_ll(sqrt(y)): 550000001. Why is that?

It so happens, that the square root of 302500001100000000 differs too small from 550000001. So, again, it gets truncated because of not enough precision, if the answer is fitted into double. We're lucky here and its get rounded up, to the "correct" answer.

To confirm that, let's use sqrtl function, which takes long double as an argument and returns long double as a result. We can see that roots are slightly different: sqrtl(x): 550000000.999999999069, sqrtl(y): 550000001.000000000000. I have not checked further, but I believe that the absolute difference is less than double's precision here.

Obviously, when "rounded" using C-style conversion to long long (which is basically "rounding to zero"), results become different: to_ll(sqrtl(x)): 550000000 and to_ll(sqrtl(y)): 550000001.

Frankly speaking, some sources claim that C++ should have an overload of sqrt for long double which returns long double. I believe it depends on the header used (<math.h> vs <cmath>) and compiler's version. In my compiler, there is no overload for long double regardless of whether I use <math.h> or <cmath>.

Ok, so now we see that if double and sqrt are used, result is "correct" because rounding errors compensate for each other, in "incorrect" if long double and sqrtl are used. However, your program does not seem to use any of the latter explicitly, so why does that happen anyway? The answer is, I believe, very typical:

Believe it or not, operating with 80-bit floating point numbers is cheaper than with 64-bit, because the former are implemented in hardware. The latter are not. So, excess precision never hurt nobody, right? Right. Now let's perform all computations right in FPU registers without transferring values to/from memory where our short 64-bit double reside. That results in lack of rounding and more precise calculations. Actually, we have no control over which intermediate values are rounded (because there are not enough FPU registers and the value has to be stored in memory) and which are not.

And, finally, icing on the cake: there is an x86 assembly instruction fsqrt which takes a square root and optimizer knows that it's what sqrt is supposed to do. So, optimizer happily replaces call to sqrt (which would involve types conversion and stack frame allocation) with single fsqrt instruction. Voila — now we have our double "replaced" with long double and sqrt "replaced" with sqrtl and we can start blaming excess precision.

Why results differ from platform to platform?

I believe it's because the "bug" (which is actually not a bug) requires a lot of things to happen simultaneously. If either optimizations are fully turned off, or the optimizer does not have the insight about sqrt of function, or its heuristics which decide in favor of doing everything inside FPU registers do not work well, bug is not there.

What do we do?

My personal recipe:

Never use floating point numbers (especially in intermediate calculations) unless you really have to.

If you do use, ensure that all inputs, intermediate values and outputs can be safely stored in the floating point type you're using. Otherwise, you gonna have a bad time.

If you want to convert floating point number back to an integer, expect rounding error. Add some eps to the number before converting or use round function. Note that eps should be big enough to make some change: if you add 10 - 15 to 10000.25, it's not going to change anything, although 10 - 15 can be stored in double. Also, eps should be small enough to ensure that you don't "overround" your number.