Very interesting article, thank you. I wanted to clarify one point though. You talk about query performance improvement which, of course, is the holy grail with database applications, especially when performance starts slowing down with increased data volumes etc. However, you are also looking at table space optimisation and it appears to me that your article demonstrates the space saving more than performance optimisation.

I realise that you say query performance is improved due to faster joins with higher row density, but you don't show any evidence of that and your simple row count query took longer due to compression. So for optimum query performance would you recommend trying to achieve higher density without compression (by optimising datatypes etc.)?

Although I agree that saving space is "a good thing" (tm), I do wonder if the full impact of this operation is considered.

As far as changing an nvarchar to a varchar where functionally possible, hurray!

As far as PAGE compression goes... hmm=> it's nice to see I/O comes down due to the more compact storage as IO often is a bottleneck on servers. (CPU's are cheap, memory and fast disks are 'relatively spoken' not) but then again, the actual execution times go up !?=> what happens to INSERT/UPDATE/DELETE operations ? Won't they be affected more when using PAGE compression ?=> what happens to queries that try to fetch data from NON-indexed columns. I know doing so is "bad", but it happens all the time, either because there is some WHERE part that affects an additional (non-index-included) field, or because the WHERE messes up the search-ability by putting some function around the fields.=> adding more (compressed) indexes to cover all fields doesn't really sound like an option either as it will make the database bigger again and probably causes lots of side-effects when making changes to the data

I guess a lot of the actual gain depends on the way the data is being used; * functionally (eg. only access via some procedures with "restricted" (and hence known) execution plans or more of the ad-hoc type of operations), * technically (SELECT by day, update by night or a mixed bag 24/7 ?) * and practically (3 vs 300 concurrent users)

Simply having a slimmer database does not necessary result in a fitter database IMHO, but I agree it might... I'm merely trying to be cautious here.

My interest was picked-up when I read your article about size optimization, and I very like your approach.

As a first time poster on this site, I am enthusiast as for sharing my case. This reply may be seen as a long one, if so go to "In short..."

In one contract few years ago, I had a desire to have a quick look of what table space usage could be optimized, but in a really more basic way than your approach.

I was curious to see how much wasted space was present with "over sized" string type fields for some big tables, but wanted to develop a simple query for that (yeah, one could say I was a little "naive" or "over optimistic" for my experience level at that time...).

Anyway, in the context that I was working from another one design, I wanted to have a quick look. I wished to design only one query if possible, even if it was not Christmas yet... (a dream cannot hurt, right ?)

As a discussion example for clarification here, one string field in a table could have been originally created with a 120 characters length in mind at design time, when in reality some long term usage shows the maximum length really "consumed" for all rows in this field is only 45 characters. Here, "long term" could be few years or few million rows, as you wish for your particular case.

This kind of information would then be for me a starting point to discuss or evaluate if it would be beneficial to reduce the length of this "longer-than-necessary" string field as a table size optimization point of view. In this example, one could decide to reduce the length field from 120 to 60 characters, for example.

In short, I was curious to find out the maximum field length usage for every string type fields at least in one table, and I began to search if someone else had created some tools or an article for that particular task. In the time I allowed myself, I find none. (yeah, one could say I could have invest more time, but I am probably a little "naive" after all...)

I was motivated to develop something basic but general purpose, small but giving efficient results. I did not want to have "Temp" tables or lots of steps, no really, maybe only one or a few very short queries was also in my design objectives (speaking "naive"...).

My first tries to develop something were not immediately successful, however with that pending desire in mind, I then fell on an article written by Grep Larsen showing me how I could have all the required information for one table, and my motivation was raised again.

If it does not hurt anyone here, I just want to say that this one particular inspiring article appeared in www.sqlserverexamples.com. Do not worry, I am still a very enthusiastic amateur of "SQL Server Central"... :)

Using this inspiration, I finally went with this approach at that time:

------------------------------------------------------------------- Display the Maximum Usage Length for Each Column in a Table (a simple tool helping to optimize ANY existing table definition)-- WRITTEN BY: BRUNO ARNOLD-- DATE: 04/06/2008---- Originally inspired by reading GREG LARSEN on this:-- http://www.sqlserverexamples.com/v2/Examples/DynamicTSQL/DisplayingSmallestRowinTable/tabid/207/Default.aspx-- -- I mainly use this to quickly see (on big tables) if a character field length-- is really using (or needing) that range of field length, thus helping to spot possible space optimization.---- FOR EXAMPLE:-- In this example using the [SalesOrderDetail] table from [AdventureWorks], I "could securely" change-- the [CarrierTrackingNumber] field's design definition to nvarchar(12) instead of nvarchar(25),-- but ONLY if the future usage is not meant to be change, of course.-- In this case, the select shows 24 bytes for this field, but it is a nvarchar type,-- so it means you could use 12 characters in the "nvarchar" design definition.-------------------------------------------------------------------

One could say that my "wish list" of objectives were not fully achieved, but at least I had fun doing that part.

After all, this code could serve as a basic start point to get all field size informations for all tables for all databases for all servers for all planets... (oups, dreaming again...)

Seriously, I did make sure to put the credits to Greg Larsen for the parts that inspired my own solution, because I think it is very important to do that every time we can, thus sending positive encouragements for good work.

Everyone likes positive messages from time to time. And like what's going on SQL Server Central, "together" we can be better than alone.

We're just interested in large tables with low row density so this should suffice for your purposes.

As for the earlier comment about insert performance, there is definitely an overhead when loading new data into compressed tables. Bulk loading can typically be optimized by using uncompressed heaps and compressing after the fact, but this of course depends on your environment/scenario. I would suggest reading the MSDN technical article called "The Data Loading Performance Guide" (http://msdn.microsoft.com/en-us/library/dd425070%28SQL.100%29.aspx) for more details.

I personally have nothing but good things to say about data compression, but as with everything SQL Server related, "it depends." If you typically observe high levels of CPU pressure then you should definitely tread lightly. However, for I/O bound environments the gains have always outweighed the losses in my experience, which is what I believe the author was trying to demonstrate (although his example didn't do enough to support this case).