18
Tiny Floating Point Example 8-bit Floating Point Representation  the sign bit is in the most significant bit.  the next four bits are the exponent, with a bias of 7.  the last three bits are the frac Same general form as IEEE Format  normalized, denormalized  representation of 0, NaN, infinity sexpfrac 143

26
Rounding Rounding Modes (illustrate with $ rounding) $1.40$1.60$1.50$2.50–$1.50  Towards zero$1$1$1$2–$1  Round down (-  )$1$1$1$2–$2  Round up (+  ) $2$2$2$3–$1  Nearest Even (default) $1$2$2$2–$2 Round down: rounded result is close to but no greater than true result. Round up: rounded result is close to but no less than true result. What are the advantages of the modes?

27
Closer Look at Round-To-Even Default Rounding Mode  Hard to get any other kind without dropping into assembly  All others are statistically biased  Sum of set of positive numbers will consistently be over- or under- estimated Applying to Other Decimal Places / Bit Positions  When exactly halfway between two possible values  Round so that least significant digit is even  E.g., round to nearest hundredth 1.23499991.23(Less than half way) 1.23500011.24(Greater than half way) 1.23500001.24(Half way—round up) 1.24500001.24(Half way—round down)

28
Rounding in C http://www.gnu.org/s/libc/manual/html_node/Rounding-Functions.html  double rint (double x), float rintf (float x), long double rintl (long double x) These functions round x to an integer value according to the current rounding mode. See Floating Point Parameters, for information about the various rounding modes. The default rounding mode is to round to the nearest integer; some machines support other modes, but round-to-nearest is always used unless you explicitly select another. Floating Point Parameters  If x was not initially an integer, these functions raise the inexact exception. http://www.gnu.org/s/libc/manual/html_node/Floating-Point- Parameters.html#Floating-Point-Parameters  FLT_ROUNDS characterizes the rounding mode for floating point addition. Standard rounding modes:  -1: The mode is indeterminable.  0: Rounding is towards zero.  1: Rounding is to the nearest number.  2: Rounding is towards positive infinity.  3:Rounding is towards negative infinity.

39
IEEE Floating Point has clear mathematical properties Represents numbers of form M x 2 E One can reason about operations independent of implementation  As if computed with perfect precision and then rounded Not the same as real arithmetic  Violates associativity/distributivity  Makes life difficult for compilers & serious numerical applications programmers