In this short series on compression in Oracle, we will be looking at the various types of compression used by traditional Oracle systems, this means: (i) basic table compression, (ii) OLTP table compression, and (iii) index compression. I won’t be discussing the hybrid columnar compression (HCC) that appeared with Exadata (though I may write up a mini-series on that topic some time in the future).

Of the three mechanisms, index compression and basic table compression are part of the core product, while OLTP compression is part of the separately licensed “Advanced Compression Option (ACO)”. In this first article we will restrict ourselves to creating data using basic table compression, leaving the question of updating and deleting compressed data to the second article, which will lead us into “OLTP” compression in the third article. We’ll leave index compression to stand alone in the fourth and fifth articles.

Our aim in this first article is to answer a few of the more common questions that people ask about compression by examining the mechanisms that Oracle uses for basic table compression.

When does (basic) compression work

The usual questions people ask about compression are: “how do I create compressed data?”, “how does Oracle decompress the blocks?”, “what impact does compression have on performance?” and the question you should ask before using any feature “are there any side effects I need to worry about?”

The easiest way to answer the first question is through a demonstration showing some of the possibilities. Here are 5 pieces of SQL that will create and populate a table with a copy of the first 50,000 rows of the view all_objects on a database running 11.2.0.3; after running each one I collected stats on the table and ran a query to report the number of blocks in the table, and a few other details about the table.

There are other possibilities, of course; we could define a tablespace so that any table created in that tablespace was, by default, compressed; we can arrange for a single partition or subpartition of a partitioned table to be compressed; we can even change the default compression on partitioned tables so that all new partitions or subpartitions are created with compression.

Sticking with our five samples, though, I’ve summarised the results of these code fragments in the following table – although you’ll notice that I’ve reported two results from test 5, one from before the move, one after:

Test

BLOCKS

PCT_FREE

COMPRESSION

COMPRESS_FOR

1 (CTAS)

714

10

DISABLED

2 (CTAS compress)

189

0

ENABLED

BASIC

3 Insert

644

0

ENABLED

BASIC

4 Insert append

189

0

ENABLED

BASIC

5a Compress

714

10

ENABLED

BASIC

5b Move

189

0

ENABLED

BASIC

The first test is just a baseline to give you an idea of how much space the data needs by default – as you can see I let the table used the default value of pctfree (reported in view user_tables as pct_free) and needed 714 blocks to hold the data.

When I included the compress option with the CTAS, Oracle automatically set the pctfree to zero – and in this case managed to squeeze the data down to just 189 blocks. The zero pctfree is a hint that Oracle thinks this table is going to be read-only; but it is possible to set a non-zero pctfree and, as we shall see in the next article in the series, there may be cases where you want to take advantage of that option. Notice that in this, and all subsequent tests, the two columns reporting compression show that basic compression is enabled.

In the third and fourth tests I created an empty table with the compress option (which is why the pct_free is 0), and then inserted the data. As you can see, the data is only compressed when the insert is a direct path insert (you may recall that “compress basic” was originally “compress for direct_load operations”); the normal insert, even though it’s an array insert, doesn’t compress the data. (The drop from 714 blocks to 644 blocks in test 3 is just the side effect of pctfree dropping from 10 to zero – allowing for small variations 90% of 714 is very close to 644).

The final test then warns us that simply changing a table from uncompressed to compressed does nothing to the data. If you want to convert uncompressed data to compressed data you first need to change the table definition, and then you need to move the table to compress its contents. If you do this, of course, you’d also have to rebuild any indexes on the table immediately afterwards.

It’s compression, Jim, but not as we know it.

The rest of the questions I’ve asked above can best be addressed by seeing how Oracle does compression, and the answer is that (for basic and OLTP compression) Oracle doesn’t really do compression. What it does is “de-duplication” at the block level. Imagine you had three rows in a block containing the following data:

Oracle could notice that the value ‘XXXX’ appears twice, that the value ‘abcdef’ appears twice, and that the value ‘CLOSED’ appears three times. So it can create a table of repeated values in the block, and insert tokens into the rows to make them shorter, so our block might start to look like this:

In fact, Oracle can get smarter than that, because it can rearrange the column order for each individual block to maximize the possibility of multiple columns turning into a single token. Notice, in my example, that token T1 and token T3 both appear in all three rows. Oracle can rearrange the order that the columns are stored in this block to put those tokens side by side, and create a new token the represents the combination of the two individual tokens, so our block becomes:

Let’s take a closer look at this by examining a few lines from the symbolic block dump of a block. Here’s the first fragment I want to highlight – it appears only for compressed table blocks:

perm_9ir2[4]={ 2 0 1 3 }

This is a table with 4 columns but for this block Oracle has rearranged (permuted) the order the columns are stored so that the thing stored at column 2 in this block is column 0 in the table definition, column 0 is really column 1, column 1 is column 2 and column 3 is (still) column 3

0x24:pti[0] nrow=65 offs=0
0x28:pti[1] nrow=400 offs=65

There are two “tables” in this block, the first is the token table which holds the 65 tokens which will be used in the block, starting at offset zero in the block’s row directory, the second holds 400 “proper” rows, starting at offset 65 in the block’s row directory. (This means, by the way, that the block’s row directory currently has 465 entries – there are some interesting side effects that can appear with large row directories.)

If we hunt through the block dump to find the first row in table 1 (i.e. the first “real” row) we find the following, which looks pretty much like an ordinary row dump from a typical heap table block dump for a row with 4 columns. But there are a few special points to notice:

According to the column lengths (the number in square brackets) the row length should be 26 bytes, plus the 4 bytes for the 4 column lengths, plus one byte each for the flag byte (fb:), lock byte (lb:) and column count (cc:) – but the total length (tl:) is only 5 bytes. And the last line of this extract shows us the actual values (bind map [ed: see note below from Flado, “binary dump” is a much better interpretation of bindmp]) of those five bytes. These five bytes are the flag byte (0x2c = ‘–H-FL’), the lock byte, the “stored” column count – i.e. the number of columns stored at this location, which is just one – and the next two bytes tell us that that one “column” is a token representing 4 consecutive and we need to look at token 0x31 of the token table (interestingly, different versions of Oracle managed to order the token table differently even though the environment seemed to be the same – my test results are all from 11.2.0.3). Let’s look at row 49 (0x31) in table 0:

The token looks almost identical to the row – but the total length of the token is 19 bytes. So let’s look at the binary dump for the token. The first two bytes of the map tell us that this token is used 8 times in the block. The next byte tells us that there are 4 columns in the token and, through some cunning encoding, the next two bytes tell us that the first two columns of this token are actually tokens 0x36 (decimal 54) and 0x40 (64). You can then see the actual values preceded by “200 + column length” for the last two columns of the token.

So let’s look at tokens 54 and 64 – which allow us to see that they are, indeed, single column tokens, with values that match the values we saw in token zero and the actual row. (You might note that token 54 is used in a total of 10 places in this block (the 0x0a in the second place in the binary dump, and token 64 is used in 5 places.)

So, by tracking our way from row directory to row, to token (by way of row directory), to two more tokens (by way of the row directory) we see that we can expand an entry that started out as 5 bytes into a full row of 4 columns with 26 bytes of data.

There are several lessons to be learned from the work we did tracking through the block dump. The first is that Oracle doesn’t decompress the table blocks, it simply re-constructs the row you need by hopping back and forth between the row directory and the row pieces (the code may even avoid visiting some tokens – the single column ones – if the column values aren’t needed for the SQL statement). The second is that it’s possible to spend a significant amount of extra CPU time bouncing around a block to reconstruct a row – this could have a noticeable impact on CPU usage if you are doing lots of tablescans. As a side effect, because Oracle will have to hold (pin) the block for some time to reconstruct rows, you may find that your code will do fewer “consistent gets – examination” which means more activity on the “cache buffers chains” latch. Of course, we hope that the extra CPU time will be offset by the smaller number of physical reads we may have to do because we’ve packed our rows into a smaller number of blocks, which may allow us to keep more data cached for longer.

Summary

There’s still a lot to say about the side effects of compression, in particular what happens when you delete or update rows, and this will lead us on to the implementation of compression for OLTP (or ‘for all operations’ as it used to be) – but those are topics for future articles.

What we have seen from this first article is that basic compression is used only with direct path inserts, not with ordinary DML, and Oracle will by default set pctfree to zero on the table, which is a good indication that you are not supposed to modify the data once you’ve created it – basic compression is really only appropriate for data that is virtually read-only.

We have also seen that basic compression is just de-duplication of repeated values – but Oracle can get very clever about minimizing the amount of space used. In particular the stored column order can change from block to block, allowing Oracle to maximize the options for creating tokens that represent multiple adjacent columns. This deduplication mechanism means that Oracle doesn’t have to decompress blocks, it keeps the blocks in the buffer cache just like any other block, but reconstructs rows (in the PGA) by following pointers to tokens – and chasing pointers is a CPU intensive process; the better your compression the more CPU intensive your queries (particularly your tablescans) are likely to be.

There is a catalogue of all five items in this series (and a few others) at this URL

Jonathan Lewis is a well-known figure in the Oracle world with more than 26 years experience using the software. He has published three books about Oracle, and contributed to three others. His most recent book Oracle Core: (Apress, ISBN 978-1430239543) is a description of the central working of the Oracle database engine.
He runs a couple of websites and contributes fairly regularly to newsgroups, forums, and User Group magazines and events around the world.

24 Comments

Hello Jonthan
Could I ask a couple of questions?
1) "code will do fewer “consistent gets – examination” which means more activity on the “cache buffers chains” latch"
Is it typo "fewer" ?

2) "Oracle will have to hold (pin) the block for some time to reconstruct rows" and "it keeps the blocks in the buffer cache just like any other block, but reconstructs rows (in the PGA)". So, where does Oracle reconstruct rows?

1) No typo - but could perhaps have been worded more clearly. Of the "consistent gets", fewer will also be recorded as "consistent gets - examination", so more of the consistent gets will require two latch acquisitions rather than one.

2) A row will be reconstructed in the PGA, but the session will have to pin the block for longer while jumpng all over the block to find and copy the pieces that allow it to constructu the row.

There's nothing available to supply the detail needed. You could use dbms_rowid() to analyze the file , block and row number of rows; you could analyze the table to list the chained rows; you could use dbms_space_admin to find out (roughly) how much free space there was below the table high-water mark. But to see why rows migrated, and why two very similar sets of data can occupy very different amounts of space, etc. you have to start by looking at the block dumps to see how compression works.

Very interesting compression insides with a clear explanation!!
Thanks Jonathan for your time spent!
looking forward for OLTP compression…
Just got to know that one customer compressed 18th huge SAP DBs (completely)
using OLTP compression. This scares me a bit. I wonder what side effects can that bring.
-- Kirill Loifman

There's a good chance that unless they've set a large (e.g. 30 - 50%) value for pctfree on their transactional tables they may see a significant increase in CPU, lots of migrated rows, and lots of "buffer busy waits" waits.

Just something to keep in mind with Basic (and OLTP) Table Compression:

It is limited to tables with at most 254 columns.
Table with rows consisting of multiple row pieces are not supported, and therefore tables with more than 254 columns are effectively not compressed - although it doesn't show any error messages when trying to compress them.

The data volume simply doesn't decrease and the compression doesn't take place, although in the dictionary the COMPRESSION attribute seems to suggest so, which can be misleading.

But compression might also sometimes decrease CPU consumption: if you don't need to follow a "factored out" token, the rest of the row can fit more easily on cache layers (say, the L1 cache), thus avoiding stalls.
This makes the CPU overhead less severe "on average"...

Alberto,
That sounds like a reasonable argument - although the restrictions on the data pattern might be so severe that it might rarely appear in real life.

I did try to test the hypothesis, but the tests always showed more CPU being used for compressed data. (In fact the results suggested that my hypothesis about Oracle skipping single column tokens is wrong.)

Jonathan,
interesting ... maybe Oracle uncompresses the row on access, say into the PGA, and then runs the "legacy" code on the uncompresse row. That looks like the most straightforward (and less risky) implementation to me.
If that's the case ... maybe it might be optimized in the future

I like your suggestion far more than my own idea (so I've gone back and edited it into the article). I arrived at "bind map" because my first thought about the string of hex was that it was mapping the tokens. "Binary Dump" seems much more obvious with hindsight (even if it is a Hexadecimal dump ;) )

Hi,
started compressions(OLTP Table & LOB) on pre-prod, would like to know oracle internally what are all the process is doing(Technical details like disk write/reads etc), also came to know that performance impact on DML operations. also while performing compression/in compression enabled table, if there is a major dataload happens on the same table, then in that scienorio what could be the performance issue may encount.
while performing the compression activity, what are all the pre-request and pro-active can be taken care.
Pls. guide me for the above .
Thanks & Regards,
Siva...

It depends what you've done with the data since the first attempt at compression (and also what type of compression, but I'm going to assume that you're asking about basic compression here); if you've done something that means you've introduced non-compressed data in the table then the (necessary) "alter table move" will reduce the current size of the table; if you haven't done anything to introduce non-compressed table such a move won't increase the level of compression; however (unlike some of the available file compression mechanisms) it won't make the table grow in size.

I suppose there is a little caveat that goes with the above - if you've been using "insert /*+ append */" to add compressed data, or if you've been deleting data, then there may be some small "holes" in the data segment that will be eliminated when you move the table; so you might see some reduction in total space required. You could use the dbms_space package, though, to check the state of blocks before the highwater mark to see if this is the case, though.

Examples 5(a) and (b) above show you that if you alter a table to compressed you have to MOVE it before the data is actually compressed. That means you will also have to rebuild all the indexes on the table as well.

[...] covered in a four part series hosted by All Things Oracle. The link to the first part of it can be found here. Even if you think you don’t need to bother with BASIC compression you are mistaken: it is [...]

Cary Millsap’s latest book

The Method R Guide to Mastering Oracle Trace Data, Second Edition contains the richest description of Oracle extended SQL trace data that you’ll ever find, and over 100 pages of worked examples, using the software tools built by Cary’s Method R Corporation.