The downside to compressing string stored in the database is that running ad-hoc SQL queries (such is mystring like '%foo%') is not possible.

Background

I was building an app that was downloading and storing lots HTML and for convenience I was storing the HTML in a SQL Server database. SQL Server has no good way to compress stored data so the database files grew very quickly. This bugged me because most of the content would compress well. I was using Entity Framework and started throwing around for ways to hook into how EF serializes data or for a way to create a custom string type that could handle the compression. Even with the EF6 pre-releases, I couldn’t find any such hooks.

The actual compression is done by LZ4Sharp which is a .NET implementation of the LZ4 compression algorithm. LZ4 is notable, not for compressing data a lot, but for compressing and uncompressing data extremely quickly. A single modern CPU core can LZ4-compress at up to 300 MB/s and uncompress much faster. This should minimize the overhead of compressing and uncompressing data as it enters and leaves the database.

Limitations

As mentioned in the introduction you can’t do ad-hoc SQL queries because compressed strings are stored in the database as binary blobs. Querying with NHibernate as also somewhat limited. Doing document.Text == "foo" actually works because NHibernate runs “Foo” through the compression. Queries that involve Contains() will (silently) not work, unfortunately. This is because NHibernate translates this to a like query, which won’t work with the compressed binary blob. I haven’t looked into hooking into the query engine to fix this.