Monday, April 11, 2011

11gr2 - Analyzing Compression capabilities

With 11gR2, Oracle has come a long way with regards to compression capabilities. In almost all benchmarks I have performed, compression is probably one of most significant factors influencing performance of queries. Compression beats any other database optimization feature hands-down.

While fast returning queries are alone not enough to meet Business requirements, in 11gR2, it does appear that compression performs adequately for all operations when compared to default. It does have some caveats, though overall, the results are quite good. It may make sense to evaluate using Compression as the default for user data.

To check the viability of compression for a typical use case, I conducted 5 tests -

CTAS

Updates

Conventional Inserts

Deletes

Queries

The variables were

Time to complete each activity

Object Size after each activity

Query performance

I did not measure system load during each of these activities. I assume that CPU and Memory are sufficient.

As always, I used realistic example with real data (not generated). The tests were conducted on a single instance 11gr2 database on Linux X86-64. I was running the tests on a Quarter Rack Exadata with 3 storage cells. However, all cell offloading was disabled.

The table in question had 148 million rows and was 36 columns wide. The columns were a mix of varchar(2), number and date with not null constraints. The table is not partitioned.

Table Creation using CTAS
The first test was the object creation using CTAS. It was done in parallel and with the defaults for extent sizing (auto allocate).

I also generated a flat file for the table and compressed using Gzip and Bzip to get an idea as to how it compares to Database compression.

As you can see, database compression ranges from 2.5x (OLTP) to 9x (HCC Archive) which is more or less comparable to what is normally seen in the real world. If you had historical read only data, then storing as External Tables (compressed flat files) would probably be a better idea than storing in the database as a regular uncompressed table. With 11gr2, external tables have come a long way.

Updating rows in the table

Compression has always received a bad review due to poor performance during updates. But how bad of a performance hit is there with an update? When talking about performance, I would be referring to time taken to update, growth in size of the object and followed by query performance.

Generally I would assume that if you are planning on updating > 10% of a big table, it would be better to rewrite the update as a CTAS rather than do an update. In order to simulate a worst case scenario, I updated 11% of the table (15.5 Million rows) - 2 columns in order to gauge the effect of the update.

Deleting rows in the table

The same can be said of deleting rows too. In order to guage the impact of deletion, I deleted 6% (5 Million rows) from the table.

Inserting rows into the table

Direct path loads may not be feasible at all times. So I inserted 1 million rows into the table using buffered path writes.

Query performance

After each of the tests, I ran a query which required a full tablescan to see the impact on query performance.

Results

For a CTAS and Update, the uncompressed version of the table outperformed the compressed versions. There was a difference of approx 2x. However for Deletes and Inserts, the performance was either the same or slightly better with the compressed versions.

And finally the table size after each DML operation. A bulk Update has resulted in growth of the table, however not anywhere as close to the uncompressed versions.Inserts have re-used space from the Deletion.

Conclusions

With compression, space savings can be significant.

DML does grow the object, however the size is still considerably smaller than the uncompressed version.

Bulk Updates still perform slower than uncompressed.

Deletes and conventional inserts perform about the same as the uncompressed version.

Query performance - Compression improves performance significantly.

For historical or archived read-only data, External Tables as compressed flat files may be a viable option rather than storing in the Database.

1 comment:

In other words query performance gain is proportional to compression ratio. On non exadata platforms if compression ratio is not high enough (atleast > 30%) we may not see substantial performance gain by licensing additional oltp compression option.