I have encountered the following dilemma several times and would be interested to hear how others have addressed this issue or if there is a canonical way that the situation can be addressed.

In some domains, one is naturally led to consider very wide tables. Take, for instance, time series surveys that evolve over many years. Such surveys can have hundreds, if not thousands, of variables. Typically though there are probably only a few thousand or tens-of-thousands of rows. It is absolutely natural to consider such a result set as a table where each variable corresponds to a column in the table however, in SQL Server at least, one is limited to 1024 (non sparse) columns.

The obvious workarounds are to

Distribute each record over multiple tables

Stuff the data into a single table with columns of say, ResponseId, VariableName, ResponseValue

Number 2. I think is very bad for a number of reasons (difficult to query, suboptimal storage, etc) so really the first choice is the only viable option I see. This choice can be improved perhaps by grouping columns that are likely to be queried together into the same table - but one can't really know this until the database is actually being used.

So, my basic question is: Are there better ways to handle this situation?

3 Answers
3

You might want to put a view in front of the tables to make them appear as if they are a single table. The upside is that you can rearrange the storage later without queries needing to change. The downside is that only modifications to the base table can be done through the view. If necessary, you could mitigate this with stored procedures for frequently used modifications. Based on your use case of time series surveys, it sounds like inserts and selects are far more frequent than updates or deletes, so this might be a viable way to stay flexible without forcing your clients to update if you need to rearrange things later.

But, aren't views also constrained to 1024 columns? Otherwise, I agree, this would be a good approach.
–
3SphereJul 11 '12 at 0:45

Given that wide tables allow 30,000 columns and I don't see an explicit limit for number of columns in a view, I would say it is worth at least trying. msdn.microsoft.com/en-us/library/ms143432.aspx. Another possibility is to try and partition in a way that makes sense to the users, create multiple views and if you need to reorganize later, you can still have multiple tables sitting behind multiple views. It is less than ideal, but depending on how they are used, it still might be better than direct access to the tables.
–
John WattsJul 11 '12 at 11:30

Hmmm it really depends on what you do with it. If you want to keep the table as wide as it is (possibly this is for OLAP or data warehouse), I would just use proper indexes. Also based on the columns that are selected more often , I could also use covering indexes. Based on the rows that are searched more often, I could also use filtered indexes. If there are, let’s say, billions of records in the table, you could partition the table as well.

If you just want to store the table over multiple tables, definitely use proper normalization techniques, probably up to 3NF or 3.5NF, to divide the big table into smaller tables. I would use the first method of yours, normalization, to store data for the big table just because it seems like it makes sense better to me that way.

This is an old topic but something we are currently working on resolving. Neither of the above answers really give as many benefits as the solution we feel we have found.

We previously believed that having wide tables wasn't really a problem. Having spent time analysing this we have seen the light and realise that costs on inserts/updates are indeed getting out of control.

As John states above, the solution is really to create a VIEW to provide your application with a consistent schema. One of the challenges in any redesign may be, as in our case, that you have thousands or millions of lines of code referencing an old wide table and you may want to provide backwards compatibility.

Views can also be used for UPDATES and INSERTS as John alludes to, however a problem we found initially was that if you take the example of myWideTable which may have hundreds of columns and you want to split this to myWideTable_a with columns a, b and c and myWideTable_b with columns x, y and z then an insert to a view which only sets column a will only insert a record for myWideTable_a

This causes a problem when you want to later update your record and set myWideTable.z as this will fail.

The solution we're adopting, and performance testing, is to have an 'insteadof' trigger on the View insert to always insert to our split-tables so that we can continue to update or read from the view with impunity.

The question as to whether using this trigger on inserts provides more overhead than a wide table is still open, but it is clear that it will improve subsequent writes to columns in each split table.

Subsequent performance testing for us resulted in this approach being SLOWER than keeping our "wide" table. It should be noted that our table 'only' has around 200 columns. Not the 30,000 that has been mentioned in the original post. If we approached 1,000+ columns this may be worth revisiting.
–
JLoMay 16 '14 at 13:57