Measuring Numbers – Is this a Valid Comparison?

4062010

June 4, 2010

I encountered an interesting test case in the “Oracle SQL Recipes” book, but I fear that my degree in mathematics is causing me to fail to fully comprehend the test case. I developed a parallel test case that possibly answers the questions that are left unanswered. Here is my test case:

With the value one-third stored in each column, we can use the VSIZE function to show it was much more complicated to store this [the value of 1/3] with decimal precision [using the NUMBER datatype], taking nearly three times the space [when compared to the BINARY_DOUBLE datatype].

Here is the output from my script for the row containing the value one-third:

As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table. What, if anything, is wrong with the comparison?

Hint: To conserve space, the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns in the above output was changed from:

Related

Actions

Information

4 responses

4062010

Centinul(21:01:55) :

“As the book states, the column with the NUMBER datatype requires 21 bytes, while the column with the BINARY_DOUBLE datatype requires just 8 bytes to store the value one-third in the table. What, if anything, is wrong with the comparison?”

I would say that there are possibly two things wrong with this comparison.

1. The book has chosen a single value to compare sizes instead of looking at the range of values. Oracle’s NUMBER datatype is of variable length. Based on the test case presented the NUMBER_VALUE column in some cases will use as little as 2 bytes while the BINARY_DOUBLE remains fixed at 8 bytes. Depending on your needs the NUMBER datatype could be more efficient then a BINARY_DOUBLE.

2. I don’t think it’s fair to compare a NUMBER to a BINARY_DOUBLE. The default NUMBER has much more precision then anything a BINARY_DOUBLE can provide. The cost of that extra precision is more storage as shown in the test case. A 64-bit double has approximately 16 digits of precision as shown when using an unmodified column format:

Depending on the requirements, and precision needed, one could say that NUMBER_VALUE does not equal BIN_DBL_VALUE so what value is there in comparing the storage taken up by each?

I think a more appropriate comparison in this case since the maximum value is 1000 would be a BINARY_DOUBLE to a NUMBER(20,16). This will give a very similar precision to that of a BINARY_DOUBLE. When you compare the size of a NUMBER(20,16) to a BINARY_DOUBLE in this case there is only a one byte difference. As shown:

I think that you just demonstrated one of the advantages of this type of blog article.

The hint about changing the column format for the NUMBER_VALUE and BIN_DBL_VALUE columns to save space was true, but you pinpointed the reason why I selected to to strip exactly 4 decimal positions from the column format – there are supposed to be an infinite number of “3” digits after the decimal point, and as your comment shows the next digit is a “1” digit for the BINARY_DOUBLE column. As you stated, we reached the limit of the 8 byte (64 bit) storage capacity (precision/scale) for the column. The column with the NUMBER datatype stored even more “3” digits:

Great idea to change the NUMBER_VALUE2 column to a NUMBER(20,16) datatype – I did not consider that approach in part because I did not think that the storage requirement would drop below roughly 21 bytes (I was wrong, obviously).

Curious now, how was Oracle able to store that many digits in the NUMBER(20,16) column in just 9 bytes?

Now the next question is: What would be a valid comparison of the NUMBER and BINARY_DOUBLE datatypes? Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table? Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign? What if the BINARY_DOUBLE datatype column was indexed?

“Would we need to consider the possibility that the BINARY_DOUBLE datatype column might be joined to or compared to a NUMBER column in another table?”

Yes, see below.

“Would we need to consider what would happen when the BINARY_DOUBLE datatype column appears in the WHERE clause and a bind variable with a NUMBER datatype was on the other side of the equal sign?”

Yes, see below.

What if the BINARY_DOUBLE datatype column was indexed?

Generally speaking, NUMBER to BINARY_DOUBLE results in the same implicit conversion issues that characters to numbers do as mentioned in your previous blog article “True or False – Why Isn’t My Index Getting Used?”

In this case it looks like Oracle has applied transitive closure (line 3 of Predicate Information) and with the side affect of implicit conversion a full table scan was done instead of an index range scan.

Very well written. I think that you have just demonstrated the potential unintended side-effects that are caused by selecting the numeric datatype for data storage purposes without first giving consideration to how the data will be used/accessed.

The link that I provided for the “Troubleshooting Oracle Performance” book states that there are two advantages of the BINARY_DOUBLE (and BINARY_FLOAT) datatype compared to the NUMBER datatype:
1. The BINARY_DOUBLE (and BINARY_FLOAT) datatype implements the IEEE 754 standard so that a CPU can directly process the numbers without first being processed from Oracle’s internal libraries as is needed by the NUMBER datatype.
2. The BINARY_DOUBLE (and BINARY_FLOAT) datatype has a fixed length (always requires the same number of bytes).

Accurately describing how much more processing efficient the BINARY_DOUBLE (and BINARY_FLOAT) datatype is than the NUMBER datatype might not be possible – there are probably just too many variables (and too many number combinations). Additionally, whether the CPU uses big-endian or little-endian (http://en.wikipedia.org/wiki/Endianness) could also be important.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: