Background to Indexes.

When you store data in an indexed table, certain columns of data are copied
into the index alongside the rowid of the data row in
the table. The data in the table is stored 'randomly', or at least, not
necessarily in the order you put them there.

The index entries, on the other hand, must be stored in order,
otherwise the usability of the index is removed. If you could walk through the
entries in an index, you would see that they are in
order, usually ascending, but since 8i, descending also works.

Entries are stored in order of their internal representation, not
necessarily the same as what you see on screen when you SELECT the columns from
a table.

If the indexed column(s) contain character data (CHAR, NCHAR, VARCHAR2 or
NVARCHR2) then the data will appear on screen exactly as it does in the index.
For example, if the column contains 'ORACLE' the index entry will also be
'ORACLE'.

We can use the DUMP command to show us the internal representation of any
data type. This command takes four parameters. The first is the data you wish
to dump, the second is the base you wish to dump it
in. The default is 10 which means that DUMP will
display the characters in decimal, or base 10. The other allowed values are 8
(Octal), 16 (Hexadecimal) or 17 (Characters).

The third parameter is the start position in the data you wish to dump from
and the final parameter is the amount of data you wish to dump. All but the
first parameter have sensible defaults.

Using DUMP, we can see the individual character codes for our 'ORACLE' data :

SQL> select dump('ORACLE',10) from dual;

DUMP('ORACLE',10)

-------------------------------

Typ=96 Len=6: 79,82,65,67,76,69

We can prove that this is correct by converting back from decimal character
codes to actual characters :

Numeric columns are very much different. The internal format of a number is
different from that which appears on screen after a SELECT because the internal
format is converted to ASCII format so that it can be displayed. We can see
this in the following, first in character format :

SQL> select '1234' as "1234",

2dump('1234', 17)

3from dual;

1234 DUMP('1234',17)

---- ---------------------

1234 Typ=96 Len=4: 1,2,3,4

Then in internal format :

SQL> select 1234 as "a number",

2dump(1234, 17)

3from dual;

a numberDUMP(1234,17)

---------- --------------------

1234 Typ=2 Len=3: c2,^M,#

The first columns in both examples look identical, but this is only because SQLPlus has converted the internal format of the number
1,234 into the character format so that the display device (the monitor screen)
is able to show it. Binary characters have a nasty tendency to disrupt
character devices like computer monitors when running in text mode.

Take a look at the second column in the above examples and notice the
difference. In the first example we see the individual characters '1', '2', '3'
and '4' while the second example shows only three bytes in the internal format
of the number 1,234. Lets change the DUMP calls slightly, and do the whole lot
in one command :

SQL> select '1234' as "1234",

2dump('1234', 10),

31234 as "a number",

4dump(1234, 10)

5from dual;

1234 DUMP('1234',10)a numberDUMP(1234,10)

---- ------------------------- ---------- ----------------------

1234 Typ=96 Len=4: 49,50,51,52 1234 Typ=2 Len=3: 194,13,35

This time, we see the actual character codes used internally. Once again
columns 2 and 4 differ. Column 4 is showing three bytes and these three bytes
are the internal binary representation of the number 1,234.

It is this binary representation that is used in the index entry when a
number column is indexed.

Take a few minutes and experiment with dumping a few other numbers - stick to integers for now as those are what sequences
generate.

SQL> create table test (a number);

Table created.

SQL> begin

2for x in 1 .. 1e6

3loop

4 insert into test values (x, substr(dump(x,10), 14));

5end loop;

6end;

7/

PL/SQL procedure successfully completed.

If we have a look at the 'b' column of the table, we can see that each entry
is ascending in a similar manner to the 'a' column. Here are the first 20 rows :

SQL> col b format a20 wrap

SQL> select a,b from test where a < 21;

A B

---------- ----------

1 193,2

2 193,3

3 193,4

4 193,5

5 193,6

6 193,7

7 193,8

8 193,9

9 193,10

10 193,11

11 193,12

12 193,13

13 193,14

14 193,15

15 193,16

16 193,17

17 193,18

18 193,19

19 193,20

20 193,21

The entries are very similar and all have the same leading byte.

How sequences affect indexes.

As mentioned above, index entries have to be stored in order, however, the
table data need not be. If your indexed column is fed by a sequence, the data
will be similar to the 20 rows shown above.

Similar entries will group together in the index, so the index blocks will
split as necessary and new entries will end up all hitting the same block until
it too fills up and splits.

If you have one person running the application, this isn't too much of a
problem. If the application is multi-user then it means that every user will
tend to write into the same index block and buffer busy waits will be the
outcome as transactions 'queue' to write data to the hottest index block
around.

Back in our small test, if you select more data from the test table, you
will find that in the 1 million rows, there are only 4 different values for the
leading byte on the internal numeric format and even worse, most of the entries
in the index have the same leading byte value :

SQL> select substr(b,1,3),count(*)

2from test

3group by substr(b,1,3);

SUB COUNT(*)

--- ----------

193 99

194 9900

195 990000

196 1

I cheated and discovered that there was a comma in position 4 of every row
in the table that's how I knew to use a three character length in my SUBSTR.

What the above shows is that in an index of 1 million sequential entries,
the vast majority have the same leading byte and so
will all be trying to get into the same block in the index.

How reverse indexes cure the problem.

A reverse key index stores the bytes of the indexed column(s) in reverse
order, so the data 'ORACLE' is actually stored in the index as 'ELCARO'. Using
a reverse index on a column fed by a sequence spreads the location of
sequential numbers across a wider range of leaf blocks and the problem of a
single hot block is removed because the index entries are stored in reverse
order.

SQL> alter table test add (c varchar2(30));

Table altered.

SQL> update test set c = substr(dump(reverse(a),10),14);

1000000 rows updated.

SQL> select substr(c,1,instr(c,',')-1),count(*)

2 from test

3 group by substr(c,1,instr(c,',')-1)

4 order by to_number(substr(c,1,instr(c,',')-1))

SUBCOUNT(*)

--- ----------

2 10102

3 10101

4 10101

5 10101

All other numbers between 6 and 95 inclusive, have 10,101 entries each.

96 10101

97 10101

98 10101

99 10101

100 10101

99 rows selected.

This time, our 1 million row index entry has it's
leading byte value spread across 99 (100 if you include a value for zero)
different values, rather than just 4. In addition, the actual reversed bytes are
fairly randomly scattered across each of the different values too.

As more entries are added to the index, blocks will be split to accomodate the new entries in their proper location. As the
data is arriving almost 'randomly' by means of the reversing of the actual data
bytes for the index, the index itself will be extended to accomodate
these new values. However, rather than always being stored in the same single
'hot' index block, new entries will be spread across a number of existing
blocks (assuming the index has been around for a while) thus reducing
contention. Of course, block splits will still occur on these blocks as new
values fill up the existing block but it's happening all over the index not
just in one place.

This is the reason why reversing the index when its leading column is fed by
a sequence reduces buffer contention, removes the hot block problem and by
doing so, reduces the potential for buffer busy waits on a multi-user system.

Drawbacks to Reverse Key Indexes

Of course, there are drawbacks as well. By setting up a reverse key index
you are increasing the clustering factor of the index. The clustering
factor (from USER_INDEXES) is used by the optimiser
(CBO) to determine how best to access data in an INDEX RANGE SCAN. If the
clustering factor is roughly equal to BLOCKS minus FREE_BLOCKS from USER_TABLES
then the chances are that a range scan will read one index block, and locate
all (or nearly all) of the data rows in needs in one or more adjacent blocks in
the table.

On the other hand, if the clustering factor is close to NUM_ROWS in
USER_TABLES then the chances are that the entries stored together in one index
block are likely to be scattered throughout a wide range of table blocks - so
the index range scan may not be chosen as a good method of access.

Obviously the above applies to an analysed table
and index.

In a quick test on a table with 100,000 rows loaded using a sequence, a
normal index was used for most queries returning up to 30 rows, as was the
reverse keys index, however, when the number of rows went up to 1,000,000 the
reverse key index was never used and a full table scan was used every time.

Addendum (David Aldridge March 2005)

Although the clustering_factor is usually
increased by rebuilding the index as a reverse-key index, there are two balancing
points to consider:

ii)When
key values are generated from a sequence it is extremely rare that the rows identified
with lexically adjacent keys have any
real-world connection, so the index range scans ought never to be required by
the application.

Of course, neither of these statements applies when the index is a
multi-column index, and the first column contains repeating values.