Survey of Rounding Implementations in Go

Rounding in Go is hard to do correctly. That is, given a float64, truncate the fractional part (anything right of the decimal point), and add one to the truncated value if the fractional part was >= 0.5. This problem doesn’t come up often, but it does enough that as of this writing, the second hit on Google for golang round is a closed issue from the Go project, which declined to add a Round function to the math package. That issue also includes many community contributions about ways to round.

In this blog post I’d like to examine those and other implementations of round and audit them for correctness. We will see that nearly all have bugs preventing use in production software. So to answer a question that appeared while I was writing this blog post: round seems obvious, but is not.

Surveying the options for round

There are multiple ways to round (away from zero, toward zero, half away from zero, half to even, etc.), depending on one’s use case. Here we will discuss half away from zero, which rounds up if the fractional part is >= 0.5, and rounds down otherwise.

The requirements of a correct round implementation are that it:

rounds half away from zero for all finite inputs

supports special values (NaN, Inf, -0) by returning them unchanged

We will use the following test cases to verify correctness, where the second value is the desired result after rounding the first:

These include all the special cases, some normal cases, and some edge cases that will prove difficult for many algorithms to handle. (Note that since floats aren’t exact, using -0.49999999999999999 is the same as 0.5. The value used here is the highest float < 0.5. Also, printing -0.49999999999999994 at very high precision returns -0.499999999999999944488848768742.)

The suggestions in the linked issue were often (obviously) untested, but assumed to work, even when suggested by very well known people. That they don’t work is a testament to how difficult rounding is to do correctly for all inputs.

What happened in the first two failures is the n-0.5 computation resulted in -1.0, even though we expected something strictly > -1.0. If we look at the round implementation of Postgres we can see they explicitly avoid this problem:

Subtracting 0.5 from a number very close to -0.5 can round to exactly -1.0, producing incorrect results…

This is not an uncommon problem. Java up through version 6 was broken in this way, but has since improved their implementation.

int + Copysign

The third suggestion comes from minux, which is an attempt to fix the negative input problem:

Since the return value of that function is an int32, we can assume they know their inputs may not ever be special or large, and ignore those inputs, but that still leaves the very close to 0.5 inputs that fail.

Solutions that specify the rounding digit

Many people would like something that, in addition to normal float -> int round, can round to N digits: round(12.345, 2) = 12.35.

strconv

There was a long thread on golang-nuts about rounding, and one of the popular solutions was to use strconv:

This uses the ‘g’ format verb, which returns N digits total, not N digits after the decimal point. The test cases for that function were mostly less than 1, which is why it appeared to work, but it fails for general inputs.

Multiply by 10^N

Many other solutions for this problem use an implementation that multiplies the input by 10^N (where N is the desired number of digits), rounds, then returns that number divided by 10^N. These algorithms have two kinds of problems. First, if N is sufficiently large, then it can overflow during the multiplication to Infinity. Second, it still has to round correctly, and that’s hard to do as seen above.

Here it is trivial for v*pow to be greater than math.MaxInt64 (or MaxInt32 on 32-bit systems, as this converts to int), and cause problems. Even if it doesn’t do that, we’ve already seen above that the int(f - 0.5) solution doesn’t work for various cases.

See the line with math.IsInf. This function detects when multiplying by 10^N will overflow, but it handles it by silently returning the input with no indication of error. Even when specifying 0 precision, it fails with:

This works correctly for banker’s rounding (discussed below), but uses some undefined behavior of Go. The conversion of v (a float64) to uint64 is not well defined and works differently on amd64 and arm. While fixing the arm bug, CockroachDB decided to use a more tested algorithm, and consulted Postgres’ approach.

Working Implementations

Below are some working implementations in Go.

Postgres (adapted from C to Go by CockroachDB)

The Postgres comment above is from a round implementation in C. CockroachDB adopted this to Go (shown here with comments removed). It implements banker’s rounding (round to even), and excluding that difference, it passes all tests.

Let’s analyze how this works. The first 6 lines handle some special cases. The next 4 set roundFn to Ceil or Floor depending on whether the input is negative. The following line stores the original input. Now it gets interesting:

Next it checks if x is equal to zero or went over zero to the other side (the sign change checking). If either of those happened, then the input was <= 0.5, so an appropriately signed zero is returned.

if x == xOrig-math.Copysign(1.0, x) {
return xOrig
}

This tests for large inputs, for which x-0.5 == x-1.0, and returns the input unchanged.

r := roundFn(x)
if r != x {
return r
}

Next the ceil or floor func is executed and returned if it mutated the input, which can only happen if the original value’s fractional part was not exactly equal to 0.5 since subtracted 0.5 from the input earlier.

return roundFn(x*0.5) * 2.0

Here the fractional part is equal to 0.5 so we need to round to nearest even (remember this isn’t the same as away from zero like all the others; this is just how Postgres rounding works). The comment in the code describes it best:

Dividing input+0.5 by 2, taking the floor and multiplying by 2 yields the closest even number. This part assumes that division by 2 is exact, which should be OK because underflow is impossible here: x is an integer.

We could change this line to round away from zero with:

return xOrig + math.Copysign(0.5, xOrig)

Which makes this funtion work except when the input is exactly equal to 0.5 or -0.5, because those cases are handled specially above.

Notably, Postgres does not provide a round(x, n) function as it is likely really hard to do correctly, since it has two difficult problems in one, as we’ve seen above. (CockroachDB does have that function, but it cheats a bit by converting the float to an infinite-precision decimal, rounding there, and converting back.)

github.com/montanaflynn/stats

An implementation supporting precision selection at github.com/montanaflynn/stats works for the test inputs if specifying 0 precision. (Note that it doesn’t detect the overflow if precision is high, but otherwise is a good implementation.) With comments and the precision code removed:

The key difference between this algorithm and others is the use of math.Modf, which correctly splits out the fractional and integer part.

math.Round in Go 1.10

Some months after the release of Go 1.8, someone re-requested the addition of math.Round. This discussion continued to post broken round implementations (bringing the number of broken implementations up to something above 8). But happily, the Go team has agreed to add math.Round in Go 1.10! Even more happily, someone has posted a working implementation.

For those unfamiliar with how floats are implemented (I’m on that list), this function looks magical. Let’s dig in and see how this works. Looking at the first two lines of code (skipping the constants):

bits := math.Float64bits(x)
e := uint(bits>>shift) & mask

It looks like we get some bits, and are selecting some information out of them in the shift and mask. From IEEE 754:

The encoding scheme for these binary interchange formats is the same as that of IEEE 754-1985: a sign bit, followed by w exponent bits that describe the exponent offset by a bias, and p−1 bits that describe the significand.

Looking at the consts above, the shift is 64 - 11 - 1, which is 64 total bits less 11 for the exponent and 1 for the sign, or 52 bits for the mantissa (or significand). This means the shift is removing the 52 mantissa bits and the mask is removing the sign bit, leaving us with just the exponent.

switch {
case e < bias:

Exponents are offset by a bias, 1023 in this case, which means you have to subtract 1023 from the e computed above to get the actual exponent. Or, as written above, if e < bias, then we have a negative exponent, which means the absolute value of the float must be 0 < x < 1. Indeed, the code reads:

Here bits is masked with the sign bit, so it will be 1<<63 if negative or 0 if positive. This is only used to preserve the correct sign: we can completely ignore the mantissa now. We can do that because what we actually care about is the exponent. Exponents in floats are in base 2, not 10. The representation is: (sign) (mantissa) * 2 ^ exponent. Since we are already in a e < bias block, we know that the smallest exponent we could have is -1. 2 ^ -1 is 0.5. Furthermore, the mantissa has some value 1.X, where X are the bits of the mantissa in base 2. Thus, with exponent -1, the float must be in the range [0.5, 1). If the exponent were smaller at -2, then the float would have some value less than 0.5. So, if e == bias-1, we are >= 0.5, and thus need to add one to the result. Phew. See Double-precision floating-point for the details about this.

Now the second case:

case e < bias+shift:

What you think is going to be the condition in this case statement is case e > bias to cover all of the positive exponents. But instead we only get a subset of them. The use of shift here is especially interesting because it doesn’t seem to be of a compatible unit with bias. One is the number of bits to move, the other is a numeric offset. But, since floats are represented as (1.mantissa) * 2 ^ X, if X is larger than the number of bits in the mantissa, we are guaranteed to have a value with no fractional part. That is, the exponent has moved the decimal point to the right enough that the mantissa is completely to its left. Thus, this case statement ignores float values that are already rounded.

The first line here is easy: remove the bias out of e so we get the real exponent. The second line adds 0.5 to the value. This works because the highest bit of the mantissa contributes 0.5 to its final sum (see the representation linked in the wikipedia article above). In the case that this sum overflows the 52-bit bounds of the mantissa, the exponent will be increased by one. The exponent won’t ever overflow to the sign bit because the exponent can’t be higher than bias+shift from the case above. In either case, the fractional part is cleared. Thus, if the fractional part was >= 0.5, it will increase the value by 1, otherwise it will truncate it. Tricky, and not at all obvious until we looked deeper.

Conclusion

This post has mostly described away-from-zero rounding, but there are many others. Some applications may need others, and it is an exercise to the reader to figure those out. With the description of how correct rounding is performed in Go, though, it should now be more clear how to correctly write and evaluate rounding implementations.

I think the Go team made the correct decision to reconsider the addition of the Round function in the standard library. Without that, we were stuck with lots of broken implementations. It is also no surprise that they chose to not add a function that accepted the number of digits to round, since that adds some additional complication that can quickly break things.

The other insight here is that there are some very subtle issues with floats, and even experts can get them wrong. The “just <one liner>” copy pastes from issues are easy to come up with, but tricky to get correct.

Finally, correctly rounding floating point numbers is ridiculously hard. It is no surprise that Java was broken for 6 major versions (15 years since the release of the Java 1.0 until Java 7). At least Go got there in less time than that.