This is machine translation

Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Translate This Page

MathWorks Machine Translation

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

Floating-Point Numbers

MATLAB® represents floating-point numbers in either double-precision
or single-precision format. The default is double precision, but you
can make any number single precision with a simple conversion function.

Double-Precision Floating Point

MATLAB constructs
the double-precision (or double) data type according
to IEEE® Standard 754 for double precision. Any value stored as
a double requires 64 bits, formatted as shown in
the table below:

Bits

Usage

63

Sign (0 = positive, 1 =
negative)

62 to 52

Exponent, biased by 1023

51 to 0

Fraction f of the number 1.f

Single-Precision Floating Point

MATLAB constructs
the single-precision (or single) data type according
to IEEE Standard 754 for single precision. Any value stored as
a single requires 32 bits, formatted as shown in
the table below:

Bits

Usage

31

Sign (0 = positive, 1 =
negative)

30 to 23

Exponent, biased by 127

22 to 0

Fraction f of the number 1.f

Because MATLAB stores numbers of type single using
32 bits, they require less memory than numbers of type double,
which use 64 bits. However, because they are stored with fewer bits,
numbers of type single are represented to less
precision than numbers of type double.

Creating Floating-Point Data

Use double-precision to store values greater than approximately
3.4 x 1038 or less than approximately -3.4
x 1038. For numbers that lie between these
two limits, you can use either double- or single-precision, but single
requires less memory.

Creating Double-Precision Data

Because the default numeric type
for MATLAB is double, you can create a double with
a simple assignment statement:

x = 25.783;

The whos function shows
that MATLAB has created a 1-by-1 array of type double for
the value you just stored in x:

whos x
Name Size Bytes Class
x 1x1 8 double

Use isfloat if you just
want to verify that x is a floating-point number.
This function returns logical 1 (true) if the input
is a floating-point number, and logical 0 (false)
otherwise:

isfloat(x)
ans =
logical
1

You can convert other
numeric data, characters or strings, and logical data to double precision
using the MATLAB function, double.
This example converts a signed integer to double-precision floating
point:

Creating Single-Precision Data

Because MATLAB stores
numeric data as a double by default, you need to
use the single conversion function
to create a single-precision number:

x = single(25.783);

The whos function returns
the attributes of variable x in a structure. The bytes field
of this structure shows that when x is stored as
a single, it requires just 4 bytes compared with the 8 bytes to store
it as a double:

xAttrib = whos('x');
xAttrib.bytes
ans =
4

You can convert other
numeric data, characters or strings, and logical data to single precision
using the single function.
This example converts a signed integer to single-precision floating
point:

Arithmetic Operations on Floating-Point Numbers

This section describes which classes you can use in arithmetic
operations with floating-point numbers.

Double-Precision Operations

You can perform basic arithmetic operations with double and
any of the following other classes. When one or more operands is an
integer (scalar or array), the double operand must
be a scalar. The result is of type double, except
where noted otherwise:

single — The result is of
type single

double

int* or uint* —
The result has the same data type as the integer operand

char

logical

This example performs arithmetic on data of types char and double.
The result is of type double:

c = 'uppercase' - 32;
class(c)
ans =
double
char(c)
ans =
UPPERCASE

Single-Precision Operations

You can perform basic arithmetic operations with single and
any of the following other classes. The result is always single:

single

double

char

logical

In this example, 7.5 defaults to type double,
and the result is of type single:

x = single([1.32 3.47 5.28]) .* 7.5;
class(x)
ans =
single

Largest and Smallest Values for Floating-Point Classes

For the double and single classes,
there is a largest and smallest number that you can represent with
that type.

Largest and Smallest Double-Precision Values

The MATLAB functions realmax and realmin return
the maximum and minimum values that you can represent with the double data
type:

Accuracy of Floating-Point Data

If the result of a floating-point arithmetic computation is
not as precise as you had expected, it is likely caused by the limitations
of your computer's hardware. Probably, your result was a little less
exact because the hardware had insufficient bits to represent the
result with perfect accuracy; therefore, it truncated the resulting
value.

Double-Precision Accuracy

Because there are only a finite number of double-precision numbers,
you cannot represent all numbers in double-precision storage. On any
computer, there is a small gap between each double-precision number
and the next larger double-precision number. You can determine the
size of this gap, which limits the precision of your results, using
the eps function. For example,
to find the distance between 5 and the next larger
double-precision number, enter

format long
eps(5)
ans =
8.881784197001252e-16

This tells you that there are no double-precision numbers between
5 and 5 + eps(5).
If a double-precision computation returns the answer 5, the result
is only accurate to within eps(5).

The value of eps(x) depends on x.
This example shows that, as x gets larger, so does eps(x):

eps(50)
ans =
7.105427357601002e-15

If you enter eps with no input argument, MATLAB returns
the value of eps(1), the distance from 1 to
the next larger double-precision number.

Single-Precision Accuracy

Similarly, there are gaps between any two single-precision numbers.
If x has type single, eps(x) returns
the distance between x and the next larger single-precision
number. For example,

x = single(5);
eps(x)

returns

ans =
single
4.7684e-07

Note that this result is larger than eps(5).
Because there are fewer single-precision numbers than double-precision
numbers, the gaps between the single-precision numbers are larger
than the gaps between double-precision numbers. This means that results
in single-precision arithmetic are less precise than in double-precision
arithmetic.

For a number x of type double, eps(single(x)) gives
you an upper bound for the amount that x is rounded
when you convert it from double to single.
For example, when you convert the double-precision number 3.14 to single,
it is rounded by

double(single(3.14) - 3.14)
ans =
1.0490e-07

The amount that 3.14 is rounded is less than

eps(single(3.14))
ans =
single
2.3842e-07

Avoiding Common Problems with Floating-Point Arithmetic

Almost all operations in MATLAB are performed in double-precision
arithmetic conforming to the IEEE standard 754. Because computers
only represent numbers to a finite precision (double precision calls
for 52 mantissa bits), computations sometimes yield mathematically
nonintuitive results. It is important to note that these results are
not bugs in MATLAB.

Use the following examples to help you identify these cases:

Example 1 — Round-Off or What You Get Is Not What You Expect

The decimal number 4/3 is not exactly representable
as a binary fraction. For this reason, the following calculation does
not give zero, but rather reveals the quantity eps.

e = 1 - 3*(4/3 - 1)
e =
2.2204e-16

Similarly, 0.1 is not exactly representable
as a binary number. Thus, you get the following nonintuitive behavior:

a = 0.0;
for i = 1:10
a = a + 0.1;
end
a == 1
ans =
logical
0

Note that the order of operations can matter in the computation:

b = 1e-16 + 1 - 1e-16;
c = 1e-16 - 1e-16 + 1;
b == c
ans =
logical
0

There are gaps between floating-point numbers. As the numbers
get larger, so do the gaps, as evidenced by:

(2^53 + 1) - 2^53
ans =
0

Since pi is not really π, it is not
surprising that sin(pi) is not exactly zero:

sin(pi)
ans =
1.224646799147353e-16

Example 2 — Catastrophic Cancellation

When subtractions are performed with nearly equal operands,
sometimes cancellation can occur unexpectedly. The following is an
example of a cancellation caused by swamping (loss of precision that
makes the addition insignificant).

sqrt(1e-16 + 1) - 1
ans =
0

Some functions in MATLAB, such as expm1 and log1p,
may be used to compensate for the effects of catastrophic cancellation.

Example 3 — Floating-Point Operations and Linear Algebra

Round-off, cancellation, and other traits of floating-point
arithmetic combine to produce startling computations when solving
the problems of linear algebra. MATLAB warns that the following
matrix A is ill-conditioned, and therefore the
system Ax = b may be sensitive to small perturbations:

These are only a few of the examples showing how IEEE floating-point
arithmetic affects computations in MATLAB. Note that all computations
performed in IEEE 754 arithmetic are affected, this includes
applications written in C or FORTRAN, as well as MATLAB.

Floating-Point Functions

References

The following references provide more information about floating-point
arithmetic.

References

[1] Moler, Cleve, "Floating Points," MATLAB News
and Notes, Fall, 1996. A PDF version is available on the
MathWorks Web site at http://www.mathworks.com/company/newsletters/news_notes/pdf/Fall96Cleve.pdf

[2] Moler, Cleve, Numerical Computing
with MATLAB, S.I.A.M. A PDF version is available
on the MathWorks Web site at http://www.mathworks.com/moler/.

Was this topic helpful?

Select Your Country

Choose your country to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .