May 28, 2013

“Just because you can, doesn’t mean you should.” SQL Server has a number of limits and restrictions (many of the limits are described here). Most of the time these restrictions don’t feel restrictive at all. I can’t think of a single scenario where I’d want more than 128 characters to name a column. And I don’t think I’ll ever need to select more than 4096 columns at a time. I get the feeling that if you need over 10% of what Microsoft restricts you to, then you’re probably doing it wrong.

So I turn that thought into Swart’s Ten Percent Rule:

If you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Always? No not always. It depends of course on what restriction we’re talking about. Sometimes the rule applies more than other times. I’d like to evaluate a number of restrictions here and judge whether the rule applies or not. I’ll be assigning each 10% rule a “Risk that you’re doing it wrong”. The levels are Severe, High, Elevated, Guarded or Low.

SQL Statement Length

Let’s take a step back. Why do you want to execute a statement that’s 25 megabytes long? The only scenario I can think of is that you’re sending data to the database to be parsed and stored. There’s much better ways to send data to the database then this (See Erland Sommarskog’s definitive Arrays and Lists in SQL Server).

Or maybe you’re creating a one-off huge IN clause to pass to SQL. Better to join to a temp table or a table valued parameter.

Identifier Length

Twelve characters is not a lot to work with when coming up with names for tables or columns. If you had a column or table name with more than 12 characters, I wouldn’t be at all concerned. For comparison, Adventurework’s average column is 11 characters long. Their largest column is “HomeAddressSpecialInstructions” at 30 characters. My friend, Karen Lopez, told me she once came across this beast of a column name: “RetailTransactionLineItemModifierEventReasonCode” at 48 characters. That’s getting extreme (despite intellisense), but 12 is fine.

Number of Columns in a Table

Max Size: 1024 Columns10% of restriction: 102 columnsRisk of doing it wrong: High

Maybe you need a lot of columns because you’re treating a table like a spreadsheet. The worst offender I’ve ever seen had a whopping 96 columns. It was designed by someone who didn’t know any better. By the way, the most notable modeling smell coming from this table was the columns named [Int01] to [Int30]). In this case, the database designer was definitely doing it wrong.

Maybe you need a lot of columns to support growing functionality. As an example, say you have a table like

Then the thing to ask yourself is why is this list of columns changing so much? In this example, we see a growing list of boolean attributes (which will probably continue to grow as the application evolves). In this case it makes sense to redesign your schema using:

But maybe you really do need this number of columns, you’re probably storing semi-structured data. By semistructured data, I always think of how one would store cascading style sheets or health records.

Semistructured data is really difficult to model. As an alternative to these difficult-to-manage columns, consider storing the values in an xml column (accepting that you can fetch, but probably no longer want to filter). If you need something more radical, maybe consider a schema-less json document database like couchdb (I can’t believe I said that).

SQL Server has something called wide tables which use sparse columns but you’re on your own here. It’s not something I could recommend.

Simultaneous Server Connections

The number of connections your SQL Server instance can support can be found by this query

SELECT@@MAX_CONNECTIONSas[this many]

and the number of open connections on the server is basically this

SELECTcount(1)as[this many]FROM sys.dm_exec_sessions

It’s not that unusual to have 1000 or 2000 connections open. It is rarer to see that many active queries. But it’s not unusual to see 1000s of connections, most of whom are sleeping. It’s possible you have a large number of services and web servers connecting to a variety of databases on your server. With ado.net connection pools, it’s common to have a bunch of idle connections.

The best advice I can give is to get a baseline. Watch the connections regularly in order to understand what is normal. Maybe start with

Then go from there. Once you know what’s normal, you can then watch for changes to the status quo.

Database Size

Max Size: 512 Petabytes10% of restriction: 51 Petabytes

51 Petabytes is unwieldy in so many ways I’m not going to list them here. I’m not saying such big data sizes are impossible with a centralized RDBMS like SQL Server. But I am saying that if you’re not sure about whether you can scale SQL Server to that size and you’re reading my blog looking for advice then…

Risk of doing it wrong: Severe

Other metrics

I think Swart’s Ten Percent Rule applies to a lot of things. Try to apply it the next time you come across a SQL Server restriction and let me know if you find anything interesting.

One of our vendor databases has a core table with 209 columns, 192 of which have the word ‘custom’ somewhere in the name, along with a numbering scheme that seems to have changed over the years. Each column is of varying type and size. And yes, the vast majority of these columns are being used.

You can imagine how difficult it was integrating it with our other systems.

209 columns! That’s so crazy. Doesn’t a little mental alarm bell go off at around column #60? How do you get to #209? I guess the reasoning is “In for a penny in for a pound. Screw it, lets add another more columns!”

I have to admit that I am guilty of breaking the 10% rule for table names. A database I designed had an average of 18 characters per table name, with two tables coming in first with 53 characters each, plus numerous others in the 40s. I blame the German in me – growing up with words like Donaudampfschifffahrtskapitaensmuetze makes one more acceptable of these long names.

Hey Karl,
“Donaudampfschifffahrtskapitaensmuetze” That’s a whopping 37 letters.
I’ve heard of this German habit of running words together. Mark Twain has a go at the German language and mentions this example that he came across this one once “Generalstaatsverordnetenversammlungen” at 36 letters.

I’ve learned recently about a bizarre abbreviation practice promoted once by IBM, wouldn’t it be easier to spell Donaudampfschifffahrtskapitaensmuetze and Generalstaatsverordnetenversammlungen as D35e and G34n?

But take heart, there’s no guilt involved. I only assessed the 10% table name length restriction as a low risk that you’re doing it wrong.

On May 29th, @AdamMachanic tweeted: “Hm. Number of worker threads? Total memory? Number of CPUs? It’s a cool idea but not universal.”

No, it’s not universal, but I thought I’d assess these limits any way.

Total memory: That’s not a limit enforced by SQL Server is it?
Number of CPUs: Also no limit enforced by SQL Server (depending on edition).

These are more limited by either your wallet or the operating system, but not SQL Server.

Maximum worker threads is limited to 65535. 10% of that is 6554 worker threads which is not in the range recommended for this setting. It’s not an area I’m familiar with so risk of doing it wrong: Elevated maybe?

“By the way, the most notable modeling smell coming from this table was the columns named [Int01] to [Int30]).”

I came across something like this one in a version control system SaaS app. The database had a table with a row for each document with various metadata, along with BLOB columns named Version1, Version2, …, Version19 storing versions of the document. (I guess no one needed more than 19 versions.) Not surprisingly this table contained nearly all of the data in the database and was the customer’s main bottleneck.

About worker threads, I’ve never seen anyone need 6554 worker threads, but I’ve seen cases whether the default number of threads wasn’t sufficient (query notifications with service broker and a 500+ databases).

I find it dangerous to apply blanket statements like this to a very generic set of criteria. I mean, if all of the table names in your database are <= 12 characters, I don't want to have to be the one who is stuck trying to figure out what the names mean as they will be a mess. Of course, if you do get close to 102 (10% of 1024) columns (and the numbers have been increased to 30000 for a "wide" table using sparse columns, and even then if you get close to 3000 for a wide table) I would suggest you are making work for one of my consultant friends).

Even things like index key length as a rule might cause people to not use a Unique constraint in a place where it is necessary.

For most things, it is definitely a threshold to consider, since many limits are super high to allow for a very wide range of uses that haven't been thought of yet…But we do have to be mindful that some readers aren't as smart as you and a few will try to apply the 12 character name thing as a corporate standard 🙂

Hi Louis! Thanks for coming by.
For the 12 character limit, I do mention that 12 is fine and the risk of doing it wrong is “low”. I characterized the risk of doing it wrong as low mostly because I didn’t define a risk category called “non-existant” maybe I should have.

If someone looks at my post and makes corporate guidelines based solely on this article’s title and H3 headers, then the risk that they’re doing it wrong is “high”. 🙂

I thought I was clear enough that this rule is *not* to be applied as a blanket criteria.

Maybe (like Jim above mentions) it’s an OLAP/star schema design. But even if that’s the case, those numbers seem a bit big. Not knowing anything else about the application, I can’t guarantee what they’re doing is unwise, but I can say that what they’re doing is at least very unusual.

Sometimes there are more important design rules to follow – $$$$. We have two mainframe datacom applications that were converted to run on SQL Server and MicroFocus COBOL. The tools to do the conversion are automated and don’t include redesign. Somehow it still works, despite COBOL cursors and attention events.

Sounds like you’ve found a pretty decent solution for an application which is pretty static and you’re comparing a port versus a redesign.
And of course design rules, like some of the ones I mention above, are best considered when comparing this design versus that design.

Identifier Length – 12 is not enough and too low even as guidance. But I do recommend thirty (30) since that is the maximum size of an Oracle identifier and want our tables and views to be available in both SQL-Server and Oracle.