Data compression in InnoDB for text and blob fields

Have you wanted to compress only certain types of columns in a table while leaving other columns uncompressed? While working on a customer case this week I saw an interesting problem where a table had many heavily utilized TEXT fields with some read queries exceeding 500MB (!!), and stored in a 100GB table. In this case we were not allowed to make any query or application logic changes so we chose to implement the Barracuda file format and utilize compressed rows as this appealed to me for this mostly-read application. One quick way you can see if your rows will benefit from compression would be to read Peter Zaitsev’s blog post and execute:

Shell

1

SELECT AVG(LENGTH((`colTextField`))FROM`t1`WHERE`id`<1000

compare this to:

Shell

1

SELECT AVG(LENGTH(COMPRESS(`colTextField`)))FROM`t1`WHERE`id`<1000

In our case we saw about a 75% reduction when the TEXT field was compressed which we felt indicated there would be a benefit derived from table compression.

With the original InnoDB Antelope file format you have the choice of ROW_FORMAT=COMPACT and ROW_FORMAT=REDUNDANT where InnoDB stored the first 768 bytes of variable length columns (BLOB, VARCHAR, TEXT) in the index record, and the remainder stored in overflow pages. COMPACT became the default after MySQL 5.0.3 and has a more compact representation for nulls and variable-length fields than REDUNDANT.

Using InnoDB’s new Barracuda file format (available since InnoDB plugin 1.1 or MySQL 5.5) you can now leverage table compression by specifying ROW_FORMAT=COMPRESSED. In our case we only wanted MySQL to try to move the larger (greater than 16KB) TEXT fields off-page so we utilized the KEY_BLOCK_SIZE=16 directive. This means that each TEXT / BLOB field that exceeds 16KB it would be stored in it’s own page (less the 20 byte pointer stored in the index page). Based on our analysis 75% of the blobs stored in the table were over 8KB, which were responsible for 90% of space usage hence compressing only externally stored blobs provided substaintial advantages. Why did we choose a KEY_BLOCK_SIZE that is the same value of the InnoDB page size of 16KB? As the fine MySQL manual states:

This setting may still be useful for tables with many long BLOB, VARCHAR or TEXT columns, because such values often do compress well, and might therefore require fewer “overflow” pages.

I did not test with a smaller KEY_BLOCK_SIZE as we had minimal time to effect the table compression modification (given the long run-time of the ALTER TABLE), you may find your application benefits from a different KEY_BLOCK_SIZE value. Also note that you need to enable the dynamic variable innodb_file_format=BARRACUDA (don’t forget to set it in my.cnf!):

Shell

1

SET GLOBAL innodb_file_format=BARRACUDA;

One caveat: you must be running with innodb_file_per_table=1 as the InnoDB system tablespace cannot be compressed, see this page for further details on how to enable compression for a table.

To utilize Barracuda format tables you will need to create them new and migrate data, or affect existing tables with an ALTER TABLE statement. As table compression is table specific, ROW_FORMAT and KEY_BLOCK_SIZE directives are passed via CREATE TABLE or ALTER TABLE statements. In our case, we chose to re-build the table using ALTER TABLE via a null-operation like this:

Shell

1

ALTER TABLE`t1`ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

In our case even though the customer had a 3GHz 24-core machine the ALTER TABLE was progressing slowly as it was bound to a single CPU while compressing the data. Just have patience. 🙂 Keep in mind too that if you started with a 100GB table and assuming you know your approximate compression rate, you will be left with a considerably smaller on-disk footprint so ideally you will be able to postpone that purchase of additional disk capacity.

So what was the real-world outcome of this exercise? We were able to show a 70% improvement in queries against this table when the TEXT fields were not part of the query request due to Barracuda not storing 768 bytes of the blob on field, and reduce the table down to 30GB. Happy customer 🙂

Related

Author

Michael joined Percona as a Consultant in 2012 after having worked with high volume stock photography websites and email service provider platforms. With a foundation in Systems Administration, Michael is now the Product Manager responsible for Percona Monitoring and Management (PMM), Percona XtraBackup, and Percona Toolkit. A Canadian, Michael currently lives in México with his family.

You gain a significant performance boost by utilizing compression on mostly-read workloads because there is less disk IO required to retrieve InnoDB pages, which translates (usually) into faster query response time as the database server is blocked less on disk IO.

Your mileage may vary on write-heavy workloads if you do not have many cores or fast enough cores as you then may move the bottleneck to the CPU while waiting for compression to run on the relevant pages before committing to disk. For this reason it is less clear-cut to recommend compression — we recommend that you test your specific queries on your own dataset before implementing compression.

You gain a significant performance boost by utilizing compression on mostly-read workloads because there is less disk IO required to retrieve InnoDB pages, which translates (usually) into faster query response time as the database server is blocked less on disk IO.

Your mileage may vary on write-heavy workloads if you do not have many cores or fast enough cores as you then may move the bottleneck to the CPU while waiting for compression to run on the relevant pages before committing to disk. For this reason it is less clear-cut to recommend compression — we recommend that you test your specific queries on your own dataset before implementing compression.