DECIMAL Data Type (CDH 5.1 or higher only)

A numeric data type with fixed scale and precision, used in CREATE TABLE and ALTER TABLE statements. Suitable for financial
and other arithmetic calculations where the imprecise representation and rounding behavior of FLOAT and DOUBLE make those types
impractical.

Syntax:

In the column definition of a CREATE TABLE statement:

column_name DECIMAL[(precision[,scale])]

DECIMAL with no precision or scale values is equivalent to DECIMAL(9,0).

Precision and Scale:

precision represents the total number of digits that can be represented by the column, regardless of the location of the decimal point. This value must
be between 1 and 38. For example, representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4. You can also represent corresponding negative values,
without any change in the precision. For example, the range -9999 to 9999 still only requires a precision of 4.

scale represents the number of fractional digits. This value must be less than or equal to precision. A scale of 0
produces integral values, with no fractional part. If precision and scale are equal, all the digits come after the decimal point, making all the values between 0 and 0.999... or 0 and -0.999...

When precision and scale are omitted, a DECIMAL value is treated as DECIMAL(9,0), that is, an integer value ranging from -999,999,999 to 999,999,999. This is the largest DECIMAL value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a value of zero for the scale.

Both precision and scale must be specified as integer literals, not any other kind of constant expressions.

To check the precision or scale for arbitrary values, you can call the precision() and
scale() built-in functions. For example, you might use these values to figure out how many characters are required for various fields in a report, or to understand
the rounding characteristics of a formula as applied to a particular DECIMAL column.

Range:

The maximum precision value is 38. Thus, the largest integral value is represented by DECIMAL(38,0) (999... with 9 repeated 38 times). The most precise
fractional value (between 0 and 1, or 0 and -1) is represented by DECIMAL(38,38), with 38 digits to the right of the decimal point. The value closest to 0 would be
.0000...1 (37 zeros and the final 1). The value closest to 1 would be .999... (9 repeated 38 times).

For a given precision and scale, the range of DECIMAL values is the same in the positive and negative directions. For example, DECIMAL(4,2) can represent from -99.99 to 99.99. This is different from other integral numeric types where the positive and negative bounds differ slightly.

When you use DECIMAL values in arithmetic expressions, the precision and scale of the result value are determined as follows:

For addition and subtraction, the precision and scale are based on the maximum possible result, that is, if all the digits of the input values were 9s and the absolute values were added
together.

For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum of the scales of the input values.

For division, Impala sets the precision and scale to values large enough to represent the whole and fractional parts of the result.

For UNION, the scale is the larger of the scales of the input values, and the precision is increased if necessary to accommodate any additional fractional
digits. If the same input value has the largest precision and the largest scale, the result value has the same precision and scale. If one value has a larger precision but smaller scale, the scale of
the result value is increased. For example, DECIMAL(20,2) UNION DECIMAL(8,6) produces a result of type DECIMAL(24,6). The extra 4
fractional digits of scale (6-2) are accommodated by extending the precision by the same amount (20+4).

To doublecheck, you can always call the PRECISION() and SCALE() functions on the results of an arithmetic expression to see
the relevant values, or use a CREATE TABLE AS SELECT statement to define a column based on the return type of the expression.

Compatibility:

Using the DECIMAL type is only supported under CDH 5.1.0 and higher.

Use the DECIMAL data type in Impala for applications where you used the NUMBER data type in Oracle. The Impala
DECIMAL type does not support the Oracle idioms of * for scale or negative values for precision.

Conversions and casting:

Casting an integer or floating-point value N to TIMESTAMP produces a value that is N seconds past the start of the epoch date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone. If the setting --use_local_tz_for_unix_timestamp_conversions=true is in effect, the resulting TIMESTAMP represents a date and time in the local time
zone.

Impala automatically converts between DECIMAL and other numeric types where possible. A DECIMAL with zero scale is converted
to or from the smallest appropriate integral type. A DECIMAL with a fractional part is automatically converted to or from the smallest appropriate floating-point type.
If the destination type does not have sufficient precision or scale to hold all possible values of the source type, Impala raises an error and does not convert the value.

For example, these statements show how expressions of DECIMAL and other types are reconciled to the same type in the context of UNION queries and INSERT statements:

To avoid potential conversion errors, you can use CAST() to convert DECIMAL values to FLOAT,
TINYINT, SMALLINT, INT, BIGINT, STRING,
TIMESTAMP, or BOOLEAN. You can use exponential notation in DECIMAL literals or when casting from
STRING, for example 1.0e6 to represent one million.

If you cast a value with more fractional digits than the scale of the destination type, any extra fractional digits are truncated (not rounded). Casting a value to a target type with not
enough precision produces a result of NULL and displays a runtime warning.

When you specify integer literals, for example in INSERT ... VALUES statements or arithmetic expressions, those numbers are interpreted as the smallest
applicable integer type. You must use CAST() calls for some combinations of integer literals and DECIMAL precision. For example,
INT has a maximum value that is 10 digits long, TINYINT has a maximum value that is 3 digits long, and so on. If you specify a value such
as 123456 to go into a DECIMAL column, Impala checks if the column has enough precision to represent the largest value of that integer type, and raises an error if not.
Therefore, use an expression like CAST(123456 TO DECIMAL(9,0)) for DECIMAL columns with precision 9 or less, CAST(50 TO DECIMAL(2,0)) for DECIMAL columns with precision 2 or less, and so on. For DECIMAL columns with precision
10 or greater, Impala automatically interprets the value as the correct DECIMAL type; however, because DECIMAL(10) requires 8 bytes of
storage while DECIMAL(9) requires only 4 bytes, only use precision of 10 or higher when actually needed.

Be aware that in memory and for binary file formats such as Parquet or Avro, DECIMAL(10) or higher consumes 8 bytes while DECIMAL(9) (the default for DECIMAL) or lower consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision DECIMAL type that is appropriate and CAST() literal values where necessary, rather than declaring DECIMAL columns with
high precision for convenience.

To represent a very large or precise DECIMAL value as a literal, for example one that contains more digits than can be represented by a BIGINT literal, use a quoted string or a floating-point value for the number, and CAST() to the desired DECIMAL
type:

The result of the SUM() aggregate function on DECIMAL values is promoted to a precision of 38, with the same precision as
the underlying column. Thus, the result can represent the largest possible value at that particular precision.

STRING columns, literals, or expressions can be converted to DECIMAL as long as the overall number of digits and digits to
the right of the decimal point fit within the specified precision and scale for the declared DECIMAL type. By default, a DECIMAL value
with no specified scale or precision can hold a maximum of 9 digits of an integer value. If there are more digits in the string value than are allowed by the DECIMAL
scale and precision, the result is NULL.

The following examples demonstrate how STRING values with integer and fractional parts are represented when converted to DECIMAL. If the scale is 0, the number is treated as an integer value with a maximum of precision digits. If the precision is greater than 0, the
scale must be increased to account for the digits both to the left and right of the decimal point. As the precision increases, output values are printed with additional trailing zeros after the
decimal point if needed. Any trailing zeros after the decimal point in the STRING value must fit within the number of digits specified by the precision.

Most built-in arithmetic functions such as SIN() and COS() continue to accept only DOUBLE
values because they are so commonly used in scientific context for calculations of IEEE 754-compliant values. The built-in functions that accept and return DECIMAL are:

BIGINT, INT, SMALLINT, and TINYINT values can all be cast to
DECIMAL. The number of digits to the left of the decimal point in the DECIMAL type must be sufficient to hold the largest value of the
corresponding integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning there is sometimes a range of values that require one more digit of DECIMAL scale than you might expect. For integer values, the precision of the DECIMAL type can be zero; if the precision is greater than zero,
remember to increase the scale value by an equivalent amount to hold the required number of digits to the left of the decimal point.

The following examples show how different integer types are converted to DECIMAL.

You cannot directly cast TIMESTAMP or BOOLEAN values to or from DECIMAL values. You can turn
a DECIMAL value into a time-related representation using a two-step process, by converting it to an integer value and then using that result in a call to a date and
time function such as from_unixtime().

Because values in INSERT statements are checked rigorously for type compatibility, be prepared to use CAST() function calls
around literals, column references, or other expressions that you are inserting into a DECIMAL column.

NULL considerations: Casting any non-numeric value to this type produces a NULL value.

DECIMAL differences from integer and floating-point types:

With the DECIMAL type, you are concerned with the number of overall digits of a number rather than powers of 2 (as in TINYINT, SMALLINT, and so on). Therefore, the limits with integral values of DECIMAL types fall around 99, 999, 9999,
and so on rather than 32767, 65535, 2 32 -1, and so on. For fractional values, you do not need to account for imprecise representation of the fractional part according to
the IEEE-954 standard (as in FLOAT and DOUBLE). Therefore, when you insert a fractional value into a DECIMAL column, you can compare, sum, query, GROUP BY, and so on that column and get back the original values rather than some "close
but not identical" value.

FLOAT and DOUBLE can cause problems or unexpected behavior due to inability to precisely represent certain fractional
values, for example dollar and cents values for currency. You might find output values slightly different than you inserted, equality tests that do not match precisely, or unexpected values for
GROUP BY columns. DECIMAL can help reduce unexpected behavior and rounding errors, at the expense of some performance overhead for
assignments and comparisons.

Literals and expressions:

When you use an integer literal such as 1 or 999 in a SQL statement, depending on the context, Impala will treat it as
either the smallest appropriate DECIMAL type, or the smallest integer type (TINYINT, SMALLINT,
INT, or BIGINT). To minimize memory usage, Impala prefers to treat the literal as the smallest appropriate integer type.

When you use a floating-point literal such as 1.1 or 999.44 in a SQL statement, depending on the context, Impala will treat
it as either the smallest appropriate DECIMAL type, or the smallest floating-point type (FLOAT or DOUBLE).
To avoid loss of accuracy, Impala prefers to treat the literal as a DECIMAL.

Storage considerations:

Only the precision determines the storage size for DECIMAL values; the scale setting has no effect on the storage size.

Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats, leading zeros are not stored, but trailing zeros are stored. In these tables, each
DECIMAL value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is present and an extra byte for negative values. Once
the values are loaded into memory, they are represented in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending on the file format of the
table.

Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as possible depending on the precision specified for the DECIMAL column.

In memory, DECIMAL values with precision of 9 or less are stored in 4 bytes.

In memory, DECIMAL values with precision of 10 through 18 are stored in 8 bytes.

In memory, DECIMAL values with precision greater than 18 are stored in 16 bytes.

File format considerations:

The DECIMAL data type can be stored in any of the file formats supported by Impala, as described in How Impala Works with Hadoop File Formats. Impala only writes to tables that use the Parquet and text formats, so those formats are the focus for file
format compatibility.

Impala can query Avro, RCFile, or SequenceFile tables containing DECIMAL columns, created by other Hadoop components, on CDH 5 only.

You can use DECIMAL columns in Impala tables that are mapped to HBase tables. Impala can query and insert into such tables.

Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each DECIMAL value takes up as many bytes as there are digits in the
value, plus an extra byte if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for DECIMAL columns.

Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes depending on the precision specified for the DECIMAL column.

UDF considerations: When writing a C++ UDF, use the DecimalVal data type defined in /usr/include/impala_udf/udf.h.

Partitioning:

You can use a DECIMAL column as a partition key. Doing so provides a better match between the partition key values and the HDFS directory names than using
a DOUBLE or FLOAT partitioning column:

Schema evolution considerations:

For text-based formats (text, RCFile, and SequenceFile tables), you can issue an ALTER TABLE ... REPLACE COLUMNS statement to change the precision and
scale of an existing DECIMAL column. As long as the values in the column fit within the new precision and scale, they are returned correctly by a query. Any values that
do not fit within the new precision and scale are returned as NULL, and Impala reports the conversion error. Leading zeros do not count against the precision value, but
trailing zeros after the decimal point do.

For binary formats (Parquet and Avro tables), although an ALTER TABLE ... REPLACE COLUMNS statement that changes the precision or scale of a DECIMAL column succeeds, any subsequent attempt to query the changed column results in a fatal error. (The other columns can still be queried successfully.) This is because the
metadata about the columns is stored in the data files themselves, and ALTER TABLE does not actually make any updates to the data files. If the metadata in the data
files disagrees with the metadata in the metastore database, Impala cancels the query.

Currently, the COMPUTE STATS statement under CDH 4 does not store any statistics for DECIMAL columns. When
Impala runs under CDH 5, which has better support for DECIMAL in the metastore database, COMPUTE STATS does collect statistics for
DECIMAL columns and Impala uses the statistics to optimize query performance.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.