July 20, 2012

Compression Units

If you’re starting to work with Exadata you need to work out what how much impact you can have by doing the right (or wrong, or unlucky) sorts of things with the technology. Fortunately there are only a few special features that really need investigation: Hybrid Columnar Compression (HCC), Offloading (and related Smart Scan stuff) and Storage Indexes. These are the features that will have the biggest impact on the space taken up by your data, the volume of data that will move through your system as you load and query it, and the amount of CPU it takes to do something with your data.

There are other features that are important, of course, such as the features for affecting parallel execution, and the options for resource management, but the three I’ve listed above are the core of what’s (currently) unique to Exadata. In this note I’m just going to make a few comments about how Oracle implements HCC, and what side-effects this may have.

The Compression Unit

Under HCC Oracle uses a structure called a Compression Unit. If you look at how Oracle stores a compression unit it is, in a very real sense, stored as a single row. The number of columns in that one row is the same as the number of (real) columns declared for the table, but the content of each individual column of the row is created by concatenting the equivalent column from hundreds, if not thousands, of the original rows from the table then using a compression algorithm such as bzip2on the result.

Imagine we have 1,000 rows in a table of four columns – and we issue a command like: alter table t1 move compress for archive high; that might pack the contents of the entire table into a single compression unit looking something like the following picture:

The compression unit has a small header section – holding, among other things, the offsets into the compression unit of four “compressed columns”, then there’s a bitmap (one bit for each row in the original set of 1,000) that is used to identify rows that have been deleted (or migrated to a different location because of an update) and finally there are the four “compressed columns”. Each “column” in the picture above consists of the list of values for that column extracted from the 1,000 rows in order, and then compressed using an algorithm dependent on the level of compression (query low, query high, archive low, or archive high).

Once Oracle has constructed the compression unit, the “mega-row” is stored as a single chained row with a single column, using Oracle’s standard approach to chained rows. This allows Oracle to introduce a split anywhere in the row (including in mid-column) and continue the row in another block. Assuming my compression unit is large enough to fill four blocks, the multi-level picture would look like this:

Each block (the blue outline) has a block header (the grey area) and a row directory (the dark red strip). In this case every row directory has a single entry in it that points to the single row-piece in the block. Each row-piece has a row-header (the orange strip) describing the row but, importantly, since the row has chained across all four blocks the row header for the first three row-pieces includes a “next rowid” pointer to the next row-piece.

Here’s a data dump showing the row header information produced by a single compression unit spanning five blocks in a table. I cut these from a standard block range dump. The compression unit is made up of the first row piece in 5 consecutive blocks. Each row piece points to the rowid of the next row piece (nrid).

If you’ve looked at the dump details for chained or migrated rows before, you’ll be familiar with the values in the “flag byte” (fb) – the F in the first rowpiece is for “First”, the L in the last rowpiece is for “Last”. The N’s and P’s show that there is a Next row and/or Previous rowpiece, and the H in the first row piece tells us that this row is a row header section. (You could have an H without an F to show that a row had been migrated). As you can see, each row piece consists of a single column (cc: 1 / col 0:) – but remember, this one column has nothing to do with the columns of the original columns of the original table, the entire compression unit is a single length column. In this example you can calculate the total length of the compression unit by summing the column lengths from the five row-pieces: 8004 + 8021 + 8021 +8021 + 548 = 32,615 bytes.

You might have realised that the last block in the set of four has only used up (including the block overhead) about 750 bytes from the 8KB available. This won’t (necessarily) be wasted space; I have some examples of Oracle putting the first piece of a second compression unit into a block when the first compression unit ends with a short row-piece.

Once you realise that a Compression Unit is stored as a chained row of one column you might then start asking questions about how Oracle gets usable data from a compression unit, and how efficiently it can do it, and how big a compression unit can be.

The answer to the easy question – how big can a compression unit be – seems to be dicated by the lesser of:

32, 759 rows from the original data (even when compressed for query low)

250KB (approximately – though this might have been a side effect of my system configuration)

I’ll be looking at the harder, and more interesting, questions in the next HCC posting. In the meantime, here’s a link to all my previous pieces on Exadata topics (including a catalogue or articles by other writers).

Related

Jonathan,
thank you for this great summary!
I had no real access to HCC right now, but based on infos available I would say there are some additional limitations (at least on Exadata – don’t kow if they also apply on NFS-based HCC?)
Unlike ordinary chained rows, HCCs blocks are always in a direct row of blocks,
HCCs are always within the same extent and
HCCs are always within the same ASM AllocationUnit (I don’t know if there is a similar limit on NFS-based HCC?)
Do you agree with these 3 statements?
Another interesting point to check for HCC rows might be their ROWID? There are multiple rows packed within one block (like in good old CLUSTERs) but at the same time spread across several blocks. I also see no way to directly link DBMS_ROWID.ROWID_ROW_NUMBER to any number in a block dump?
I need to put more pressure in my simulator so I can answer some of these questions.
Martin

I am going to assume that when you say “HCC blocks” you mean the blocks that make up a single CU.

> Unlike ordinary chained rows, HCCs blocks are always in a direct row of blocks,I would expect this to be true because you can only create CUs during direct path loads. I can’t, however, prove that it’s true.

> HCCs are always within the same extent
I see no reason why this definitely HAS to be true, given that CU doesn’t even have to start on a block boundary; but it may be true as a side effect of the alignment implicit in my next answer. Update: I’ve just done a quick test, and a reasonable statistical sample suggests that the assumption that CUs do not cross extent boundaries seems to be true – even to the extent that a CU might consist of just one block and a few hundred rows because it’s the last block in the extent when every other CU in the table is 10 to 12 blocks and several thousand rows.

> HCCs are always within the same ASM AllocationUnit (I don’t know if there is a similar limit on NFS-based HCC?)
I think a single CU has to be confined to a single cell, otherwise we run into some problems when doing a smart scan with offload. This means that there has to be some fairly intelligent co-ordination between CUs, Celldisks, and ASM allocation units. So I think your comment has to be right. (With NFS this may not be such a necessity because you can’t do offloading on NFS.)

> Another interesting point to check for HCC rows might be their ROWID? There are multiple rows packed within one block (like in good old CLUSTERs) but at the same time spread across several blocks. I also see no way to directly link DBMS_ROWID.ROWID_ROW_NUMBER to any number in a block dump?
The answer to that one is fairly straightforward, the rowid you get for rows in a CU uses the block id of the first block of the CU as its block id; rowids behave as if the CU is a single very large Oracle block. (This need for “synchronisation” may also explain why the file header section – that used to be 64KB or 128KB in earlier versions of Oracle is now set to 1MB; this may have some interesting ramifications if you try to move a 10g or 9i database to 11g and Exadata.)

[…] “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this […]

[…] up a suggestion from Kerry Osborne that I show how I arrived at the observation I made in an earlier posting about the size of a compression unit, here’s a short note to show you what I did. It really […]

[…] about a couple of times, and I’ve published several notes on the blog about it, including an image of a critical slide from one of my presentations, and I was expecting to find some notes somewhere about Oracle […]

Once you compressed the data, and you put more data on the table, is there a way to know how much space would the table ocuppy if uncompressed? I mean, is there a way to query the compression ratio after the compression has been applied?

I don’t think there’s a simple way to work out what the compression ratio was if you didn’t take some notes of the data size before you started; but if you gather table stats on a compressed table I believe the optimizer is going to generate the avg_col_len and avg_row_len that it would have generated if the table hadn’t been compressed (though I haven’t checked this, and don’t have a suitable bit of hardware to check on at the moment). If that’s the case then you can estimate it on the basis of:

This allows for an 8KB block size, pctfree = 10, a couple of hundred bytes of block and row overheads. It should put you in the right ballpark, but there are a couple of edge cases where it could be a fair way out.