Introduction

PHP offers a method round to round floating point
numbers to a certain precision. This, however, does not always work as a user
would expect. This is due to the fact that IEEE 754 floating point values are
stored in the binary system which cannot represent every decimal value exactly.

Rounding methods

Rounding is traditionally seen as round to integer so that no fractions are
left over. If the number is already an integer, this is not a problem. But
if the number is not an integer, it depends on the chosen algorithm on what
is done depending on the fraction.

The following rounding methods exist:

Round to negative infinitiy

This method always rounds toward negative infinity, should be obvious. Some examples:

Explicit rounding

round() and number_format() use an explicit floating point algorithm in math.c,
while the printf() functions do rounding while converting the float to a string
using an algorithm that uses bigints (zend_strtod.c).

Implicit rounding

History of round() in PHP

In the following section, I will outline the history of the round() function
in PHP in order to provide thorough background information for the
discussion.

First version of math.c in CVS

Signature: round($float), rounds to integer, no precision argument

Uses rint() for rounding.

On systems without rint(), rint() is emulated with an algorithm that does arithmetic rounding.

ISO C specifies that rint() rounds according to the current rounding direction
of the CPU, which is round-to-nearest round-half-even (banker's rounding) by
default on any system that I know of but may be changed during runtime.
IEEE 754 does not specify arithmetic rounding as a rounding method, but
IEEE 754r will.

This first version already shows discrepancies on different systems. (arithmetic
rounding on systems without rint(), banker's rounding on every system with sane
defaults and rint())

Version 1.22 (May 17, 2000)

A second parameter is added to the function signature: round($float, $places)
where $places specifies the precision. It now implements an algorithm that does
arithmetic rounding only.

Version 1.104 (Aug 8, 2003)

Due to incorrect results on some systems (for reasons, see below) the algorithm
is modified slightly using a “fuzz” (here too: see below).

Version 1.106 (Aug 9, 2003)

Fuzz is always disabled on Win32, a (useless) configure check is added on UNIX.

General information on floating point arithmetics

This section tries to gather general information on floating point arithmetics.

Representation of floating point values

IEEE 754 specifies that floating point values are represented through three
different numbers: The sign of the number, the exponent and the fraction. A
floating point number is to be interpreted as:

(-1)^sign * fraction * 2^exponent

IEEE 754 specifies several different floating point data types, of which two
are relevant here:

Double precision: The fraction is 52 bits, the exponent 11 bits and one sign bit.

Extended precision: The fraction is 64 bits, the exponent 15 bits and one sign bit.

Representation of decimal numbers

Since IEEE 754 uses the binary (base 2) system rather than the decimal (base 10)
system for storing floating point values, it is not always possible to exactly
represent a decimal number as a floating point value. Take, for example, the
number 0.1. It cannot be represented exactly as a binary floating point number
with finite precision just as 1/3rd can't be represented as a decimal number
with finite precision. The closest floating point number within double precision
is

0.1000000000000000055511151231257827021181583404541015625

The closest floating point number within extended precision is

0.1000000000000000000013552527156068805425093160010874271392822265625

However, when rounded as integers or strings, the first 15 significant digits
of a floating point number are always exact. So when converting the number 0.1
with a precision of 14 digits after the first significant digit to a string,
it will still yield 0.1. So within 15 digits precision, floating point numbers
can be used to exactly represent a decimal number.

Precision relevancy of arithmetics with FP numbers

The question here is: What does the double variable d contain? The answer is
compiler-dependent. The Microsoft C Compiler on any system will have d
contain the closest double representation of 0.002877. The GNU C Compiler will
do so on x86_64 systems. But on 32 bit x86 systems the GNU C Compiler will
have d contain the closest extended precision representation of 0.002877
truncated to double which is NOT the same as the closest double representation.
This is due to the fact that internal calculations are done using extended
precision and results are truncated to double precision only when they are
stored in memory.

This can be avoided, however. The trick is to force the FPU to always use
double precision for calculations. The problem is that this is not possible
in a platform-independent way. The GNU C library offers the _FPU_SETCW and
_FPU_GETCW macros in fpu_control.h while on Windows Systems a function named
_controlfp is available for this job. FreeBSD provides fpsetprec() and other
Operating Systems that run on x86 require inline assembly.

Please note that zend_strtod() is also affected by this problem: On systems
with the GCC and a 32 bit x86 processor, zend_strtod() will yield different
results than strtod(). See below.

Analysis of the problems of the previous round() implementation

The previous round() implementations does not work properly on several cases.

First of all, it is nowhere clearly defined, which rounding method round()
uses. Let's assume arithmetic rounding since the current algorithm tries to
do that.

The problem with round() is not the rounding algorithm itself, that is very
straight-forward (for brevity, only the version for positive values is
included here):

In a world with infinite precision this is completely correct. But due to the
finite precision of doubles, it introduces two areas which cause problems:

The multiplication with 10^places.

The division by 10^places.

In an attempt to solve these problems, the so-called “fuzz” was added. The
fuzz simply means that instead of adding 0.5, a small bit more is added:
0.50000000001. This, however, does not solve the problem but introduces a new
one.

Let us have a look at the problems introduced by those three steps:

Multiplication with 10^places

Multiplication with 10^places is problemtic because of the fact that if the
previous floating point representation was not exact, after multiplying with
10^places the resulting floating point number may not be the exact
representation of the intended number.

Take, for example, the number 0.285. Its floating point representation is
0.284999999999999975575093458246556110680103302001953125. If you multiply
that with 100, the resulting number has the floating point representation
28.499999999999996447286321199499070644378662109375. This is not the exact
representation of 28.5 - which is actually 28.5 in this case.

The same happens for 1.255: The representation is
1.25499999999999989341858963598497211933135986328125. Multiply that by 100
and get 125.4999999999999857891452847979962825775146484375. The exact
representation of 125.5 however is 125.5.

If 0.5 is now added to that number and that number is then rounded with floor,
the result will be 28 and not 29 which would be the naive expected value.

Division through 10^places

Division through 10^places is problematic because of two possible effects:

If the internal calculation is done in extended precision, the truncated value may not be the exact double representation of the chosen value, see above.

If places > 22 then 10^places itself cannot be represented as an exact floating point number and thus the division will be inaccurate. After dividing by 10^places, the result may deviate from the nearest floating point representation of the exact result - try var_dump(2e-23 - round(2e-23,23));. Thus, even if the rounding is exact, the result after the division is not the nearest representation as would be expected.

The round fuzz

The round fuzz tries to correct the multiplication problem but causes another
one: round(0.9499999999999,1) will return 1.0 instead of the expected 0.9. Also,
since the fuzz is not activated on all platforms (but these problems are
platform-independent - with the exception of the extended precision then
truncation problem during the division), this does not actually fix the issue.

Summary of the problem analysis

The addition of the places parameter of the round() function is the actual
cause of the calculation errors. Furthermore, the fact that PHP does not
clearly specify which rounding method is used (the manual only states that
the number is “rounded”) has been cause for quite a bit of confusion.

Proposal and Patch

This proposal suggests how to fix the round function in order to make it work
properly and reduce the confusing among users.

First of all, this proposal does not want to “fix” the printf() functions.
printf() does round internally using the float-to-string converion algorithm
in zend_strtod.c which uses bigints to do the conversion. So when it comes to
0.285 which is represented as 0.284999something, it will return 0.28 instead of
0.29 if used with %2f as format string. But the problem with fixing printf() is
portability: Every other language supporting printf() or similar format strings
do it wrong in the exact same way, PHP should not deviate from that (in my eyes).
It is always possible to do printf(“%.2f”, round($float, 2)); if one really
wants correct results, as long as round() works properly. Also, changing the
printf() bigint algorithm will have adverse effects on the fact that printf
is often used with very high precision (> 20) to debug floating point algorithms.

However, the PHP manual should contain a warning for printf() that rounding may
not work as expected and that explicit rounding should be done prior to passing
the value to printf() if chosen so.

Second, %g in printf() and implicit float-to-string conversion in PHP a la
(string)$float shouldn't be fixed either. They only actually round at the
15th significant digit (if the precision ini setting is not touched). If
somebody really operates with floating point values at the edge of decimal
precision, other problems occur anyway, so one shouldn't bother. But also here,
the manual for the precision ini setting should be changed that manually
lowering the setting will not always result in correct rounding and that the
round function should be used instead.

Building an abstraction layer for FP control register manipulation

This proposal proposes to wrap the above abstraction into the following
macros:

ZEND_FLOAT_DECLARE

ZEND_FLOAT_ENSURE()

ZEND_FLOAT_RESTORE()

ZEND_FLOAT_RETURN(val)

These will be defined in Zend/zend_float.h.

Fix of zend_strtod

Since the introduction of zend_strtod() (instead of strtod() which is locale
dependent) the function suffers from the same problem as round(): It calculates
in extended precision and then truncates the result to double on some platforms.
Example:

Run that in C and run that in PHP on a Linux x86 32 bit box - you will get
different results.

For this reason, my patch also fixes zend_strtod() by adding the proposed
macros to the function. Then, C and PHP will yield the same results on all
platforms (that support IEEE 754 arithmetics anyway).

New implementation of the round algorithm

This proposal proposes the following change to PHPs round() function to
eliminate all the problems:

Usage of FP control word manipulation

The round function uses the new ZEND_FLOAT macros in order to ensure double
precision arithmetics within the function body. This will make sure the final
division works properly.

Create a function that does the actual integer rounding

A new static inline function php_round_helper(double value, int mode) was
added that rounds a number to integer. It basically does the simple arithmetic
rounding floor(value + 0.5) or ceil(value - 0.5). But it also supports the
other rounding methods round-half-even, round-half-odd and round-half-down
via the mode parameter.

Special handling for large places difference

This was taken from the 2004 proposal: If the numer of places are very large,
then 10^places is very large, too, and cannot be represented in an exact manner
anymore. This will cause inaccuracies with the final division, as explained
earlier.

The solution for that is that the rounded double is converted to a string
and e-places is added to that string. Take, for example, the number 5.3e-24
which you may want to round to 24 places precision (which it of course already
is). After rounding, the float value is 1.0 and that has to be divided by
1e24. But 1e24 is too large to be exactly represented, so instead a string
1.0e-24 is generated and passed through strtod(). strtod() on the other hand
will make sure that the nearest double representation for that number is
chosen. This of course has a performance penalty, but anybody wanting to round
such small numbers will probaby be willing to pay for it.

This change will make sure that very small (< 1e-22) or large (> 1e22) numbers
will be rounded correctly to the given precision.

Pre-rounding to the value's precision if possible

The previous measures only concern the problems with the division but not the
problem with the multiplication. Here, another measure is taken:

If the requested number of places to round the number is smaller than the
precision of the number, then the number will be first rounded to its own
precision and then rounded to the requested number of places.

Example: Round 1.255 to 2 places precision, expected value is 1.26. First step:
Calculate 10^places = 10^2 = 100. Second step: Calculate
14 - floor(log10(value)) = 14 - 0 = 14 which indicates the number of places
after the decimal point which are guaranteed to be exact by IEEE 754. Now,
2 < 14, so the condition applies. So, calculate 10^14 and multiply the number
by that: 1.255 * 1e14 = 125499999999999.984375… Now, round that number to
integer, i.e. 125500000000000. Now, divide that number by 10^(14 - 2) = 10^12
(the difference) and get 125.5 (exact). NOW round that number to decimal which
yields 126 and divide it by 10^2 = 100 which gives 1.26 which is the expected
result for that rounding operation.

Of course, one may argue that pre-rounding is not necessary and that this is
simply the problem with FP arithmetics. This is true on the one hand, but the
introduction of the places parameter made it clear that round() is to operate
as if the numbers were stored as decimals. We can't revert that and this seems
to me to be the best solutions for FP numbers one can get.

Additional parameter for the round() function

The round function now has an additional optional parameter for the selected
rounding mode. The default mode is arithmetic rounding but other rounding
modes may be selected.

Some optimizations

The 2004 proposal introduced some optimizations that this proposal has chosen
to use, just in a slightly shorter form. In order to calculate floor(log10(v))
a quick binary search lookup is used for small enough powers of 10. The same
goes for calculating 10^places for small enough values: These are looked up
in a table if they are small enough.

Comparison with the 2004 proposal

But there are four main differences between the 2004 proposal and this one:

Patching printf

The 2004 proposal also patches printf() in order to make rounding consistent.
However, as I already explained, I don't think that changing printf()s
behaviour is such a good idea since printf behaves a certain way in every other
programming language that supports that function.

A note in the manual that the precision specifier in printf() is not suitable
for rounding the values should be sufficient.

No pre-rounding to precision

The 2004 proposal does not pre-round after the multiplication and thus rounding
1.255 to 2 places with the 2004 code will not work correctly either.

Additional ini settings

The 2004 proposal also adds additional ini settings for rounding mode etc. In
my eyes this is superfluous since the rounding mode can always be set as an
additional parameter to the round() function.

Altering floating point arithmetics in PHP core

The 2004 proposal goes way beyond traditional rounding by altering FP
arithmetics in PHP in such a way that after any operation (add, subtract,
multiply, divide) the result is rounded to the 15 digits precision guaranteed
by IEEE 754. This destroys traditional floating point semantics but allows
simple decimal calculations to work as expected by users not familiar with
floating point values. It is essentially the same thing some spreadsheet
applications (e.g. Microsoft Excel) do.

I'm opposed to this kind of change of general semantics since PHP never said
it implemented a decimal type and if people are using it wrong, it's their
problem, we shouldn't break other legitimate applications for this.

The round() function is a slightly different case though, since the round()
function itself claims to be able to round to decimal precision. For this
reason, changing round()s behaviour in order to accomodate decimal semantics
is OK, since users will expect it to work that way.

Nevertheless, what could be discussed separately is the introduction of a new
type that automatically uses an arbitrary precision library internally, since
writing $a * $b is much more natural than e.g. bcmul($a, $b). This, however,
goes far beyond the scope of this proposal.