Ideas for SET STRICT_CHECKS ON

In this article I discuss various checks that SQL Server could perform when
you create an SQL module (that is, a stored procedure, trigger etc) that would help you to
find stupid errors early. By finding the errors earlier you would be more
productive – and your testers will never have to deal with your silly goofs.

Since many of these checks would break existing code if they were introduced
right off, the idea is that these checks are only active if a certain setting is in
force; tentatively the command would be SET STRICT_CHECKS ON. In March 2007, I submitted a suggestion for
this feature on Connect as
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=260762.The
purpose of this article is to develop this idea further with the hope that it
will serve as inspiration for the SQL Server team, as they start working on the
next version of SQL Server. (Update in May 2012: two versions of SQL Server has been released since I submitted my Connect item, without the request being honoured, but the optimist in me never gives up!)

SET STRICT_CHECKS ON would be a compile-time setting. When a batch is entered
and parsed, a number of things that goes unnoticed today would yield a
compilation error or possibly a warning. This applies to CREATE
PROCEDURE, CREATE FUNCTION etc as well as ad-hoc batches.

The first section is a discussion on general principles, but the
main body of this article is devoted to the possible checks that could be
performed when SET STRICT_CHECKS ON is in force. Here is a introductory list of
the checks that I discuss:

Rationale

At first glance, the whole idea with optional checks may seem corny, but there are precursors. Two
other languages that I work with have similar features. Visual Basic has
Option Explicit and Perl has use strict. And people with
experience of C from older days may recall the lint program.

The purpose with these checks is to help the programmer to find silly typos
and goofs early, so he don't have to spend his time to proof-read the code for
mistakes that the machine easily can detect. Or avoid egg on his face when
his typo halts testing or even worse production.

Commands and Terminology

In this document I assume that the command to enable this feature would be
SET STRICT_CHECKS ON, and I refer to it "strict checks in force" etc. The SQL
Server team may prefer something else, like SET STRONG_CHECKING ON and I trust
their good judgement in that regard. I have gone for strict checks
consistently through this document, as I also used it in the Connect item.

Microsoft may prefer something different than a SET option. Two alternatives that come to mind are:

Database setting (so that it applies to all connections to that database, not overridable – or a default which you can override with SET.)

A connection-string attribute. (Once connected, you can't change it.)

In this document, I assume that it is a SET option, but that is only to keep the discussion simple. In fact, when I discussed this with a contact at Microsoft he said that if they were to
do this, he preferred it to be a database setting rather than a SET option.

This document suggests a wide range of checks, of which some are quite aggressive. This raises the question whether there should be a single setting or a couple so that you can opt out of some checks. This is not without a precursor: in Perl, use
strict has three options, so that you can say use strict qw(vars
subs) if you only want those two checks. Thus, in SQL Server we could
have SET STRICT_CHECKS OBJECTS ON, SET STRICT_CHECKS IMPLICIT_CONVERSION ON etc,
and of course also a SET STRICT_CHECKS ALL ON.

However, this would increase the testing matrix for Microsoft. At the same time, it could contribute to make the feature more difficult to use: Surely, best practice would mandate SET STRICT_CHECKS ALL ON, so if only some checks are in effect that would be confusing. Thus my preference and suggestion is that there should be a single all-or-nothing knob for strict checks, and this is what I assume in the rest of this document.

Step by Step or All in One Go?

My list of possible checks is tentative, and I more or less expect the SQL
Server team to discard some of them. There is one important thing to note
though: Often when you introduce a new feature, some whistles and bells are
deferred to later versions. Usually, this is a good idea, but for this feature
this could be problematic. Many of the new checks will break existing code,
and for the first version that is alright, because you need to say SET
STRICT_CHECKS ON for it to happen. But once the setting has shipped, Microsoft cannot add new checks in the future versions of SQL Server without breaking backwards compatibility.

So realistically, the first implementation of this feature will also have to be the complete feature. In later versions, there can only be new differences between strict and lax checks for new features. I can see minor changes being covered by compatibility levels, but not strict checks for an entirely new area.

Errors or Warnings?

Should the issues that strict checking reveals be raised as errors or as
warnings?

The advantage of letting them be only warnings is quite obvious: there would
not be any real compatibility issues. Taken to the extreme, there would not have
to be any SET command at all, but the checks could always be in force.

However, there are also a number of disadvantages. One is how errors and
warnings work with the tools today. With errors, SQL Server reports a line
number, and this line number is displayed in the query tool, and the programmer
can double-click on the line to quickly find the error.

For warnings on the other hand, SQL Server does not seem report a correct line
number, and the query tools do not display them. This means that while the
programmer is informed of the problem, it will take him longer time to find
where the problem is.

This particular issue could be addressed though: SQL Server could report
all strict-check issues as level 9, and make sure to report the correct line
number for the issues. The tools would then be changed to display procedure and
line number for level-9 messages. Alternatively, the messages could be reported as level-0
messages as warnings are today, but all strict-check messages would start with "Line nnn:". The
tools would need to be adapted so that you can double-click on such a message
to find where it origins from.

But there is more to it. I am quite sure that once these checks are in place
more than one DBA would say "I don't accept any strict-check messages in my
databases", and he will want to have them all reported as errors to prevent the
objects to be created. This could be met by having SET STRICT_CHECKS WARNINGS,
SET STRICT_CHECKS ERRORS and SET STRICT_CHECKS NONE. But again, my preference is
for a simple on/off switch.

What is a little more realistic is that only some of the issues found by strict
checks are reported as errors, whereas others are reported as warnings. And that
is alright, as long as line numbers are reported for these warnings.

Nevertheless, to simplify this text, I assume that all issues found by strict
checks are reported as errors and I don't discuss the possibility of reporting
them as mere warnings any further.

One more thing: all errors detected by strict checks should make it clear
that they come from strict checks and possibly inform the user how to turn off
these checks.

Scope of the Command

The purpose of SET STRICT_CHECKS ON is to control compile-time behaviour.
Therefore, it raises some questions would it mean if SET STRICT_CHECKS ON (or
OFF) would appear in the middle of a stored procedure. Consider:

IF @x >= 0
SET STRICT_CHECKS OFF
ELSE
SET STRICT_CHECKS ON

Since SET STRICT_CHECKS is a compile directive, what would this mean?

To avoid all such ugly problems, my suggestion is that the command is only
permitted in the top-level scope, and it must either be in a batch of its own,
or be the first command in a batch (in the latter case, it would serve as a
compile-time directive). And when I say the first command in a batch, I mean it.
That is:

CREATE PROCEDURE some_sp @var int = 99 AS
SET STRICT_CHECKS ON

will not fly. Some of the checks that I propose affects the procedure
header, so above the raises the question, is the check in force for the header
or not?

Loophole?

Strict checks are there to help the programmer to catch typos and goofs. But depending on how the checks are implemented, there may be situations where the checks gets in his way, because he actually wants to do
what he types. In a few places in this document, I have identified a couple of situations where this could occur.

Since turning off strict checks for an entire stored procedure would be to throw out that proverbial baby with the equally proverbial bathtub, I thought about alternatives. One alternative would be to have BEGIN NOSTRICT and END NOSTRICT and within this block strict checks would be turned off. However, this would invite to bad practice, where inexperienced programmers would enclose all their procedures in this block, because they don't like the error messages.

In this document I have instead settled for a setting that works on line level. If the programmer adds the comment /* NOSTRICT */ on a line, SQL Server will not report any strict-check errors on that line. This is akin to how the old lint program worked.

Admittedly, it would be best to be without a general loophole to keep the language clean. But that would also require that there are ways out when you have legit reasons to work against spirit of the rules. Sometimes such ways out are easy to identify. E.g., I suggest that it would be illegal to assign a varchar(10) variable to a varchar(5) column, since this could cause data loss. The way to work around this is to use cast or convert. But there may be situations where Microsoft would have to introduce new language constructs to avoid unwanted roadblocks. Since I did not want not wander into such territory, I have opted for general loophole with /* NOSTRICT */ in this document.

What Happens at Run-Time?

Strict checks are intended to help the developer, but SQL Server does not know
if there is a developer or an application on the other end. If an application
issues SET STRICT_CHECKS ON, and then runs ad-hoc batches, they would be
checked. I don't see any major problems with this. Stupid errors like JOIN
tbl a ON a.col = a.col would result in an error, which is probably better
than returning an incorrect result.

A more intriguing situation is when SQL Server compiles an existing stored procedure to
build a query plan. Would strict checks apply in this case? I can see some
advantages with this. Today, if a procedure refers to a non-existing table, it
bombs when you reach that statement, in which case the procedure is terminated,
even if there is a local catch handler. This can cause some mess if the
procedure has started a transaction. With strict checks in force, the compilation
would fail directly and the behaviour would be cleaner. But I don't see this as
mandatory. If Microsoft finds it easier to compile code already stored in SQL Server in unstrict mode, I think this would be alright..

Saving the Setting?

Since SET STRICT_CHECKS ON is a compile-time setting, should it be saved with the SQL module, and in such case what would that mean?

Today there are two settings which are saved with the module: ANSI_NULLS and QUOTED_IDENTIFIER, and when you run the procedure, the saved settings apply. This has caused quite some confusion over the years, and it is definitely not desirable. So if the setting is saved with the procedure, it would be informational only: to make it possible for the DBA to review whether there are any procedures in the database that were entered with strict checks off.

However, observe that even that if even if a procedure was entered with strict checks in effect, that is no guarantee that it will execute or compile successfully at run-time, since a dropped table or column may have invalidated the procedure. Thus, saving the setting in metadata is at best a nice-to-have feature.

Impact on Tools

Since strict checks is a engine feature, the impact on the tools is small.
There are a few points, though:

In SSMS, the screen Tool->Options->Query
Execution->SQL Server->Advanced,
there should be a check box for SET STRICT_CHECKS ON, so that you can always get
this setting when you connect. For the first release of this feature, this checkbox should probably not be
checked by default, but if the feature is well received this could be changed
for the following version.

As I discussed above, if strict checks only result in warnings, not errors, SSMS must report the
line number, and permit you to double-click on the message to get to the line
from where
the strict-check message originates.

What about SSDT?

When I said above that nothing happened in SQL Server 2012, that was not 100 % correct. With SQL Server 2012, Microsoft released SQL Server Data Tools (SSDT) and SSDT performs some of the checks I suggest, although far from all. The checks are performed outside SQL Server, but they use a language service which, as I understand, uses the same parser and binder as the engine.

SSDT is definitely a commendable effort. I've played with it a bit, and my overall impression is positive. At the same time, SSDT is a very complex tool and introduces a way or working for development and deployment that your organisation may not be ready for. That is, SSDT is not for everyone. And therefore SSDT is not a solution for the proposals in this article. It's a decent workaround for some, but in the long run, this should be in the engine.

In the following text, I point out a few checks that I have noticed are available in SSDT. There could be others that I have not noticed; I have not played that extensively with SSDT.

already when you tried to create the procedure. This was the state of affairs up to SQL 6.5. With SQL 7, Microsoft
introduced what they call Deferred Name Resolution. You no longer get an
error about the missing table, but the procedure creates just fine. When you run it, you get this output:

That is, it is not
until the SELECT statement actually executes that you get the error message. When I
first heard about this, I could not just believe it! I was using SQL Server to
develop a large enterprise system, and Microsoft changes the behaviour as if SQL
Server was only for toys. Imagine an important function grinding to a
standstill just because of a small typo that SQL Server could have caught up
front! You may object that such typos should be caught in testing, and in most
cases they do, but: 1) the typo may be in an odd code path that was not covered
by the testers, 2) if the typo stops the tests, the testers will have to wait
for the next build, and the company loses time and money.

There are also some other nasty side
effects. Let's say that you have an outer stored procedure that calls an inner
stored procedure that first starts a transaction, and then later runs into an
error with a missing table. The inner procedure is aborted, but execution continues in the outer procedure – with the transaction still active! How many
programmers are prepared for that? In SQL 2005, TRY-CATCH makes this is a little
easier to handle, but if the inner procedure has a TRY-CATCH, that CATCH will not
fire, only the TRY-CATCH in the outer procedure.

Another problem with deferred name resolution is that the missing table can
mask other errors. Consider this procedure:

SQL 7 and SQL 2000 will gladly agree to create this procedure, as long as
#temp does not exist when you submit your CREATE PROCEDURE. When #temp does not
exist, they defer the compilation of the entire query until later. And then at
run-time, the statement blows up because you had forgotten to add the column alias. (It
should read FROM Orders o). Thankfully, this particular situation was addressed
in SQL 2005, and you now get a compile-time error. But consider this statement:

Here you will not get an error about the missing alias until
run-time even with SQL 2012. In this particular query, there is a theoretical possibility that
#temp at run-time could have a CLR column o with type methods named OrderID and
OrderDate, why the query could be legal. So is SQL Server
giving you the benefit of the doubt? Not really. Look at this query:

Here it is painfully obvious that o.OrderID and o.OrderDate can't be but errors. Nevertheless, SQL 2012 will not tell you until run-time, because it defers the compilation
until #temp actually exists. So while SQL 2005 brought some improvement, it did
not go the full way.

In these examples, I used a left-out alias definition as an example, just to
emphasise how bizarre this is. You get exactly the same result with the alias in
place, but for instance misspell OrderDate as OrderDte. That is, common sloppy
errors you want the computer to find for you go unnoticed.

SET STRICT_CHECKS ON

When SET STRICT_CHECKS ON is in effect, a statement must only refer to known
objects and to known columns in those objects. Plain and simple.

The reader may object that strict checks is no full guarantee that code will
not blow up at run-time because of missing tables, since a table could be dropped after the
procedure has been created. This is true, but the intention of strict checks is
not to make SQL Server fool-proof; it is to help the programmer to catch silly
errors early. In my experience it is not that common that tables
are accidently dropped. Silly typos are far more common. At least if I am the
one who does the typing.

What about dynamic SQL? Of course, if your stored procedure creates dynamic
SQL, strict checks are not going to help you to catch those errors before
run-time. But in an enterprise system, most of the code should be in stored
procedures with static SQL. At least in my not-so-humble opinion.

Note: a possible alternative here is the clause WITH SCHEMABINDING which exists already for functions and views. This clause in the procedure header would not only require that all objects exist, but it would also disallow them to be dropped. In my opinion, this is too rigid for what I'm aiming for here. Say that you have a table that is referred to by 200 stored procedures, and that you need to make a change to that table that requires dropping and recreating it. That would be very painful.

Temp Tables Defined in the Procedure

In the examples above, the deferred name resolution was due to the temp table
created within the procedure. Most certainly, the reason Microsoft added this
misfeature was due to temp tables. And it cannot be denied that there
is a certain logic here. CREATE TABLE is a run-time statement, so the table does
not exist when the procedure is created, and the smallest warning about it
would be irritating noise. But consider this procedure:

The basic idea is simple: when strict checks are in effect, SQL Server would
find all temp table definitions in the procedure, and use them when validating the stored procedure. (SQL Server compiles a procedure in two steps: in the first step, it parses the procedure for syntax and variable declarations, and in the second pass it performs all checks that require metadata, like checking for existing columns. SQL Server could extract the table definitions while parsing, and then use them in the second pass.)

Now, since CREATE TABLE is a run-time statement, this raises the question about
what should happen if the CREATE TABLE statement is within an IF statement or
you have something as stupid as:

While it's relatively simple to find this particular error, flow analysis gets hairy when you add control-of-flow statements into the mix. So I find it perfectly acceptable if SQL Server would let this pass without an error. In fact, this procedure is accepted by SQL Server 6.5.

You may ask: what if someone wants to have different definitions of his temp table, like this::

In SQL 6.5 you get an error when you try to create this procedure, but I
don't think this is the right behaviour. The CREATE TABLE command in the
procedure should take precedence. The temp table that exists now is not likely to exist at run-time.

I have here confined the discussions to temp tables, since this issue is about temp tables in 99 % of the time. But if the procedure creates
a static
table, it should be handled in the same way. That is, SQL Server should
extract the definition, and use the definition when checking the queries with one
difference to temp tables: if the table already exists, this should be considered an
error.

I should add that a workaround to the problems with
deferred name resolution due to temp tables is to use table variables instead. These are declared
entities, and SQL Server will consider these when checking the queries.
Unfortunately, this is not always a feasible solution. Table variables do not
have statistics, which can lead to less optimal query plans, why it is highly
desirable that queries with temp tables are fully checked at create time.

Temp Tables Defined outside the Procedure

That is, in a stored procedure you want to access a temp table created by the
caller. This could be because you pass output to the caller as in the example
above, but it could also be that the temp table holds input data to the procedure. Or it could be both;
the inner procedure could update some columns in temp table.

This scenario certainly presents a greater challenge for strict checks, and I will discuss a
couple of alternatives.

Do Nothing Special

This is how SQL 6.5 handled the situation. The file that held the definition
for inner_sp, simply had to read:

Obviously, this means you need to have the definition of #tmp in two places
which is not satisfactory, particularly if the temp table has many columns. If
you have access to a preprocessor, you can easily deal with the situation, but most
people use SQL Server without a preprocessor.

The alternative would be to use /* NOSTRICT */ on all lines where the temp table
appears:

This
could be deemed acceptable, since this type of procedures is not that common.
Then again, they are by no means rare, either.

If table-valued parameters are made read-write (which I argue
for in a separate article), the need for using
temp tables to pass data would be less. But since table parameters have no
statistics beyond the cardinality, there will still be a need for sharing temp
tables. So it would still be interesting to look for a better solution.

EXTERNAL TABLE #tmp

The idea is here that you somewhere state which temp tables you expect to
exist when your stored procedure is called. Strict checking would then not flag
the temp tables you have declared this way as missing. But the column definition
would not be known, so any misspellings of columns in the temp table would not
be reported until
run-time.

There are two places where this declaration could appear: 1) in the procedure
header. 2) a separate declaration in the body. That is, one of:

As for the syntax, WITH is the obvious choice for the procedure header, but it seems funny in
the body. So I looked at the list of reserved keywords in Books Online, and
picked the first match I found, EXTERNAL. Do not take that as a final proposal,
but a working name.

What would happen at run-time if inner_sp is invoked and no table by the name of #tmp
is to be found? The compilation would fail, and the procedure would not start
executing. (And since EXTERNAL TABLE would be a new feature, there is no reason
to let this depend on whether strict checks are in force or not.)

It would not be permitted to have a local CREATE TABLE for a temp table
declared as external.

A good question is: would EXTERNAL TABLE be restricted to temp tables only?
There can be some advantages to use it for "real" tables as well. Maybe the
outer procedure creates a staging table. If the table is in a different database that
may be missing, you get the error directly when the procedure is invoked. While
interesting, I don't see this as compelling enough. If it's easier to implement
EXTERNAL TABLE to be for temp tables only, go for that.

CREATE TABLE #tmp FROM tabletype

SQL 2008 added table types, and in SQL 2008 you can only use table types with
table variables and table parameters. If the capability was extended so that
you could use table types with CREATE TABLE, the problem with the multiple table
declarations could be reduced to:

In this solution, you would get full checks so that misspellings of columns
in #tmp would be caught when you try to create the procedure. (And again, since
EXTERNAL TABLE is a new feature, this would happen even if strict checks are
off.)

There would be a run-time check when inner_sp is invoked whether #tmp exists,
and whether it conforms to some_table_type.

Obviously, this is the best solution, but also the one that requires most
development effort.

Index Hints

I will now leave the area about temp tables and cover some other situations where deferred name resolution raises its ugly head.

Say that a procedure has a query that includes an index hint for a non-existing
index:

Today, SQL Server creates the procedure without any mention of the missing index and then the query blows up at run-time. This behaviour is clearly not acceptable. This seems like an obvious case for strict checks: if an index hint refers to a non-existing index, this is a compile-time error.

However, in this particular case, there is an alternative. There is a
suggestion on Connect that there should not be any run-time error when the
index in a hint is absent, something I entirely agree with. There is some chance that the hint itself is obsolete, and the query executes just fine anyway. Or at least no slower than the system still can progress. Whereas the now you get an run-time error which is more likely to cause an outage.

If the Connect item is honoured, I think that there should be no error at compile-time for a missing hint. That also makes sense from the perspective that since indexes are physical items, your development database may not have the same indexes as your production database. And your production database may have different indexes from the reporting database.

As far as I can tell, SQL Server will give you an error if you refer to a
non-existing type method, be that a CLR UDT column or an xml column. Just
keep on doing that. Strict checks or not.

Stored Procedures

Stored procedures are a special case, since you get a
warning if you call a non-existing stored procedure:

CREATE PROCEDURE bad_call AS
EXEC no_such_sp

The message for this condition has varied over the versions, but it has been
there since SQL Server 4.x at least. SQL 2012 says

The module 'bad_call' depends on the missing object 'no_such_sp'.
The module will still be created; however, it cannot run successfully
until the object exists.

Originally the purpose of the warning was to inform the user that SQL Server
was not able to add any rows to sysdepends, later sys.sql_dependencies. That
particular piece of information is not that interesting, but what is interesting
is of course that the procedure does not exist. SQL 2008 added a new structure
for dependencies where the dependencies are stored by name, so technically there
is no longer any reason for the message. And indeed in some CTP of SQL 2008, the
message was gone. As you may imagine, that made me very angry. Microsoft took
reason and the message is still there.

With strict checks in force the warning should be promoted to an error
(because as I discussed above this makes it easier to find where this bad call is). Should the
warning still be there when strict checks are off? I suggest that it should, but
I promise not to make a fuzz if Microsoft removes it.

Occasionally, you may have a cross-dependency: stored procedure A calls B,
and B in its turn includes a call to A. In this case, you should probably use/* NOSTRICT */ to get around it. (And in this case, you probably don't
want the warning at all.) An alternative is to first create A as a dummy, and then create B and alter A to have its actual contents.

You can use variables with EXEC for indirection:

EXEC @procname

In this case, there should of course not be any message at compile-time.

Service Broker Objects

Service Broker objects are also subject to deferred name resolution. For
instance, this

is accepted in SQL Server today. With strict checks, there would be errors
all over the place. The only thing to note is that the TO SERVICE should not be
checked. This is a string literal, and this can be a service in a remote
database in a remote server so it is not possible to validate.

Bulk-Copy Files

Would you believe it, if you say

BULK INSERT tbl FROM 'C:\temp\myfile.bcp'

the file name is actually subject to deferred name resolution. Or else, how can
you explain this.

That is, if the file does not exist, the CATCH handler is not invoked,
because the procedure is terminated on the spot (an outer CATCH handler can
catch the error). Normally, this happens only with compilation errors, so I will
have to assume that when SQL Server originally creates the procedure, it checks
to see whether the data file is there, and if it's not it defers compilation
until later, and if the file is still missing at run-time, this is the same as a
missing table.

Now, why it would look at the data file at all when creating the procedure is beyond me. In any case, I
don't think it would be a good idea if you would get an error message for a
missing file even when strict checks are in force. A missing file should be
handled as a pure run-time error, be that a data file or a format file, so that
a local CATCH handler can work. That is what most programmers would expect anyway. (If Microsoft makes this a pure run-time check, there is also no reason that the file
could not be specified through a variable, but that's another story.)

There is also OPENROWSET(BULK...), which permits you
to select columns and filter rows from the data file. The column names must come from somewhere, and they can be specified in one of these two ways:

FROM OPENROWSET(BULK...) AS table_alias(column_alias,...n)
FROM OPENROWSET(BULK...) AS table_alias

The first one leaves no mystery, as the column names are in the query. But in the second case the column names are taken from the format file, which is mandatory with OPENROWSET(BULK...) (unless you specify one of the SINGLE_LOB options). What would happen here when strict checks are in force? Should you get an error if you use the second form and the format file is not there? That appears impractical. It does not seem a good
practice to me to rely on a source outside the database to provide column names anyway, so I suggest that the second form would be illegal when strict checks are in force.

Synonyms

What about:

CREATE SYNONYM mybadsyno FOR nosuchobject

should this be permitted when strict checks are enabled? I see no harm if so.
As long as the synonym is not used, there is no problem. Of course, if you have

CREATE PROCEDURE bad_synonym AS
SELECT col FROM mybadsyno

attempting to create that stored procedure when strict checks are in force would yield an error message.

Linked Server Tables

What do you think will happen if you try to create this procedure in SQL Server today? Depending on
whether you have a linked server SERVER1 set up or not, you get one of these
messages:

Msg 7202, Level 11, State 2, Procedure linkaccess, Line 2
Could not find server 'SERVER1' in sys.servers. Verify that the correct
server name was specified. If necessary, execute the stored procedure
sp_addlinkedserver to add the server to sys.servers.

Msg 7314, Level 16, State 1, Procedure linkaccess, Line 2
The OLE DB provider "SQLNCLI10" for linked server "SERVER1" does not
contain the table ""Northwind"."dbo"."Orderss"". The table either does
not exist or the current user does not have permissions on that table.

So when a stored procedure accesses a remote object, there is suddenly no
longer any deferred name resolution! I find this ironic. Just like bulk-copy
objects, this is a situation where I may prefer to not be alarmed about something missing, or at least not missing servers. For instance, assume that as a DBA you have to apply
a change script with a couple of stored procedure to your production database
during a maintenance window. But you find that you cannot do that, because the stored
procedures refer to a linked server which also is down for maintenance.

Thus, an idea here would be keep the current behaviour when strict checks are
on, but change the behaviour when strict checks are off.

OLE DB provider "SQLNCLI10" for linked server "SERVER1" returned message
"Deferred prepare could not be completed.".
Msg 8180, Level 16, State 1, Line 1
Statement(s) could not be prepared.
Msg 208, Level 16, State 1, Line 1
Invalid object name 'Northwind.dbo.Orderss'.

Other Objects

There may be other types of objects or language constructs that I have
overlooked and for which deferred name resolution could be an issue in one
direction or another. I trust Microsoft to have good judgement to sort this out.
Wait, what did I say? After the exposé above, how could I trust them with
anything in this regard? :-) Anyway, if Microsoft wants my opinion for some
feature I've overlooked, they are welcome to contact me.

A Note on SSDT

Before I move on, I like to give credit to SQL Server Data Tools, SSDT. SSDT will alert you of many of the problems I have discussed in this section. SSDT understands to extract temp-table definitions and will give you an warning if you refer to a temp table that it does not know about, or if you misspell a column in a temp table. Unfortunately, though, it has no capability to deal with the situation where you create a temp table in one procedure to use it another.

If this looks academic to you, I can tell you that this is from a real-world case where a colleague for some reason had
declared a parameter as varchar(5) when it should have been int. It worked fine,
until one day when the procedure was called with a six-digit number. That did
not fit into varchar(5) and it went downhill from there.

Time for some nostalgia. Here is what SQL Server 6.5 has to say about the
procedure above:

And again, SQL 6.5 has it right. It's not that SQL 6.5 does not permit
implicit conversions at all, but it does not have implicit conversion between
numbers and strings. And I will have to confess that I just don't see the point
with that sort of implicit conversion. Sure, it permits me to say PRINT 1
rather than PRINT '1'. But it also opens the door for unpleasant surprises.
Assume this table:

The type conversion rules established in SQL 2000 say that when two types
meet, the type with lowest precedence is converted to the other, if an
implicit conversion exists. This has two nasty consequences in this
case. As we have seen, as soon there is a non-numeric value in datakey, the query fails. But even
if all data is numeric, all is not good. Since the varchar value is
converted to int, the index on datakey is useless, and SQL Server must scan the
table or the index. Try the query and look at the query plan.

By now, the reader understands what all this leads to: with strict checking
on, there will be less implicit conversion permitted. It would be
impractical to outlaw everything, since not all implicit conversions are
harmful. For instance, converting from smallint to int is
unproblematic in all contexts. I will have to admit that I have
not considered each and every case, rather I will cover four generic cases which I cover below.

Before I go on, I like to make a distinction between the two situations where implicit conversions can occur: assignment and expressions. Assignment is all cases when data is written to something: a column, a variable or a procedure parameter. If the source is of a different data type than the target, the source is converted to the type of the target if there is an implicit conversion available. An expression is about anything else, for instance col1 >= col2, col2 + 91. As I mentioned above, SQL Server will in this situation convert the type with lower precedence to the othre, again if an implicit conversion is available.

Types of Different Classes Meet

SQL Server 2012 has 33 basic types listed in sys.types. These can be divided
into eight classes:

Numeric.

String.

Binary.

Date/time.

Uniqueidentifier.

XML.

sql_variant.

CLR types.

The basic idea is that when strict checking is in force, implicit conversion
is not permitted from one class to another. However, the following implicit
conversions would still be permitted:

String → Date/time.

String → Uniqueidentifier.

String → XML.

String → CLR types.

String → sql_variant.

Numeric → sql_variant.

Binary → sql_variant.

Uniqueidentifier → sql_variant.

(Date/time → String.)

(Uniqueidentifier → String.)

Binary → CLR types.

That is, it is permitted to implicitly convert from String to several other data
types. The rationale here is that you need to be able to work with literals, and
there is for instance no date literal in T-SQL. Well, SQL 2012 added datefromparts(), datetimefromparts() etc, so you could argue that there is no longer any need for implicit conversion from String to Date/time. But these functions are a little verbose. And, maybe more importantly, there is probably a lot of code out there that relies on this implicit conversion. (Yes, there is also a lot of code that relies on implicit conversion from Strings to Numeric. I see that a lot on the Transact-SQL forums. People mix data types and then they get problems at run-time they don't understand, because SQL Server did not stop them earlier.)

It is not equally compelling to
have implicit conversion from Date/time to String or Uniqueidentifier to string, but neither is there any major
harm, which is why I have put these parentheses. (XML, sql_variant and CLR types to
String are not listed since there are no such implicit
conversions today.)

It is also permitted to implicitly convert to sql_variant; that's kind
of the essence of that data type. (XML and CLR types are not included, since
they cannot be stored in sql_variant.)

This means that quite a few implicit conversions permitted today would be disallowed
with strict checks on; even
some that were permitted in SQL 6.5, for instance binary to numeric. The most
controversial ban may be integer to date/time (this was not permitted in
6.5 either). And, in fact Microsoft took that step in SQL 2008 with the new date/time data types. You can say SELECT @date + 1, but only if @date is datetime or smalldatetime, but not if it is datetime2 or date. I know some people think it's useful, but I only find it corny. And what do you think about SELECT @date = 2008-05-30? That is
not an uncommon mistake.

When there is a mix of types from different classes, there is no difference between assignment and expressions. The same rules for which implicit conversions that are permitted apply for both.

Types Within the Same Class Meet

When I originally wrote this article, I said:

This is where it gets more difficult, and I will be very loose here. One
possibility would be that any conversion that could incur loss of information
would require explicit conversion with strict checks: from nvarchar to
varchar, from float to int, from varchar(23)
to varchar(8). But that could be a bit too strict even for
me in some situations. So I pass for now.

However, some time back I was approached by Nicholas Chammas, and he convinced me that there is no reason why this should not be covered by strict checks. That is, if an implicit conversion could lead to loss of information, this should yield an error when strict checks are in effect. In practice, this only concerns assignment, since in an expression the shorter type is always converted to the longer type.

There is no error, but @a will be assigned the value 'Too l'. But under strict checks this implicit conversion would not be permitted. If you wanted to accept the data loss you would need to use cast, convert or substring to state your intention explicitly. (If you wonder why SQL Server is not consistent, the background is that up SQL Server 6.0, truncation was always silent. To be ANSI-compliant, Microsoft added an option for raising an error when storing values in a column, and this option is today the norm. I guess they did not make the same change for variables for compatibility reasons.)

Unless you have a collation based on a code page that supports Polish, the output is

Lech Walesa

Note that the two middle characters in the last name have been altered. Since this is loss of information, this should yield an error when strict checks are on. You would need to explicitly use cast or convert.

What about numbers? The same principles should apply, although there is some fine print to sort out. Here are some clear-cut cases, which all should yield an error with strict checks on:

I leave it to Microsoft to make the ultimate decision here, as I do with the various date/time data types which also invites to similar considerations.

Special Considerations for Comparison Operators in Queries

There is one situation where the conversion due to data-type precedence can cause performance problems: if the expression is a comparison operator that appears in a WHERE or ON clause, and a column is implicitly converted in a way that precludes efficient use of any index on the column. With strict checks in effect, such implicit conversions would not be permitted.

In all these queries, the varchar column gets converted to nvarchar.
If the column has a Windows collation, the index will be still seeked, but in a less
efficient way. With an SQL collation, the index is
useless. The rule should not be dependent on the collation, why the worst case
should be considered. Thus all these queries would reward an error message when strict checks are in effect. Note that there should be an error, regardless whether the column is indexed or not. (As an index may be added in the future.)

According to the conversion rules, tinyint should be converted to
int, but this is not how it looks in the query plan. The same is true for
the second query. (From SQL 2005 and on. In SQL 2000, the second query causes an
implicit conversion of intcol, and any index is dead. But that's history.)

This query is less clear-cut:

SELECT a.col FROM tbl1 a, tbl2 b WHERE a.tinyintcol = b.floatcol

When I fooled around with a query like this, I got an implicit conversion on
tinyintcol, if tbl1 was the table that was scanned, and thus the implicit
conversion was harmless. But the rules should of course be independent of
actual the query plan. This case needs further investigation.

It may be worth pointing out that the error message in this case should not
say Implicit conversion ... is not allowed. Use the CONVERT function to run
this query, but rather encourage the programmer to avoid the type clash
altogether.

From all versions from SQL 7 and on, this passes. But SQL 6.5 objects:

Server: Msg 305, Level 16, State 1, Line 1
The column 'a' (user type:thistype) is joined with 'b' (user type:thattype).
The user types are not compatible: user types must be identical in order to join.

If you have used a very strongly typed language like Ada, this is perfectly
logical. Now, SQL 6.5 was quite inconsistent. This restriction applied to
joins only. It was OK to mix user-defined types in assignments, when comparing
variables or even in foreign-key constraints. So I can understand why Microsoft
dropped this rule in SQL 7.

I am not going to call for a change with regards to user-defined types. This
feature is probably not used widely enough to warrant that. But hopefully
one day (already in the next version of SQL Server???), Microsoft will add real
domains from ANSI SQL. I don't know what conversion rules that ANSI mandates,
but my suggestion is that with strict checks on, there would be no implicit
conversion whatsoever between values from different domains. A customer id and an
order id may both be integer, but if you are joining them you are doing something wrong.

What value does @str and @dec have now? That's right, M and 12. But it is not
uncommon to see questions on the SQL Server forums from people who had
different expectations. And one can hardly blame them. A default of 1 for a
variable-length string is just plain silly. And while maybe not silly, the
defaults of 18 and 0 for decimal are not obvious. And more importantly, these people may
be accustomed from other environments where you don't specify precision and
scale for decimal at all, for instance .Net.

So with strict checks in force, there would be no default length for char, nchar, varchar, nvarchar, binary and varbinary, but you must always specify it explicitly. Nor would there be any default precision or scale for decimal and numeric. And for consistency, scale should also be mandatory for time, datetime2 and datetimeoffset.

However we need one exception. Consider:

SELECT cast(intcol AS varchar)

In this marvellously consistent language known as Transact-SQL, the default
here is varchar(30), not varchar(1).So in this specific example, there is no risk for truncation. And it is likely that there is a lot of code out here which casts numbers or datetime values to string in this way. There is no reason to raise an unnecessary hurdle for the adoption of strict checks. So it should be alright to leave out the length on cast and convert – as long as there is no risk for truncation. Here are three examples that all would yield an error:

If your stored procedure calls other procedures, the sole
check at compile time is that the procedures exist, and as discussed above, you
only get a warning if a procedure is missing, not an error. If you have left out any mandatory parameters, or
specified a non-existing parameter, you will not be told until run-time. The
same goes if you specify OUTPUT for a parameter that is not an output parameter.
And if you leave out OUTPUT in the EXEC command for an output parameter, you
don't even get an error at run-time!

In contrast, if your stored procedure calls a user-defined function, you get
errors for missing or superfluous parameters already at compile-time. With one
exception: if you call a scalar UDF through EXEC, the behaviour is the same as
when you call a stored procedure.

Obviously there is room for improvements. With strict checks in force the
following would apply to the EXEC command when you submit an SQL batch:

Specifying a non-existing procedure is an error (which I already covered
in the section on deferred name resolution).

Leaving out a mandatory parameter is an error.

Specifying a superfluous parameter is an error.

Specifying a formal parameter that does not exist when you use named
parameters is an error.

Specifying OUTPUT for a parameter which is not an output parameter is an
error.

Leaving out OUTPUT for a parameter declared as OUTPUT is an error, if
the actual parameter is a variable. If the actual is a constant, it is not an error.
(Since you may not care for that output value and pass NULL.)

Specifying DEFAULT for a parameter that does not have a default value is
an error.

If there is no implicit conversion from the type of the actual
parameter to the type of the formal, this is an error.
If the stricter rules
for implicit conversion that I discussed in the above are implemented,
they should of course apply when checking parameters as well.

If the parameter is an output parameter, it is an error if there is no
implicit conversion from the type of formal parameter to type of the actual. (This check is
not to be performed if the actual parameter is a constant.) In practice this means that for output parameters, the type of the actual must match the type of the formal exactly.

If there is no implicit conversion from the type of the return value of
a stored procedure or a UDF to the type of the variable that receives the return value,
this is an error. (SQL Server checks this particular item already today.)

In all these cases, when I say "error", I do of course mean "compile-time
error".

These checks can obviously not be performed when the procedure name is
specified through a variable, for instance EXEC @spname. Special
procedures like sp_executesql will require some extra consideration.

Should these checks be performed for calls to procedures in linked servers? I
think so. At least, it should be consistent with how references to tables in linked
servers are handled. And, as we have seen, they are checked even today.

I should add that SSDT performs some of these checks, for instance it warns you for superfluous parameters.

The reason for this is that in EXEC statements there is no requirement to put a string literal in quotes, as long as
the literal conforms to the rules for regular identifiers. This is the rule that permits us to write:

sp_helpdb somedb

without putting the database name in quotes. And while this may be handy in an ad-hoc session, it is only a source
for error in a programming environment. In the example above, the intention was presumably to pass the variable @that
to the stored procedure.

Thus, with strict checks in force, it would be an error to have a string literal without quotes in an EXEC statement inside a stored procedure. (It could still be permitted in the top-level scope, to facilitate the use of system procedures.)

Require ORDER BY with TOP

If you say

SELECT TOP 20 col1, col2
FROM tbl

SQL Server is free to return any row in the table. Sometimes this is what you want – you only want 20 rows and you don't care which rows. But it could also be that you inadvertently left out the ORDER BY clause. Or you are in the false belief that it is not needed but you will always get the 20 "first" rows, whatever your idea of "first" may be.

Therefore it seems that it would be a good idea to make ORDER BY compulsory with TOP in strict-checks mode. If you really don't care about the order, you need to specify this explicitly:

SELECT TOP 20 col1, col2
FROM tbl
ORDER BY (SELECT NULL)

The same rule applies already today to the row_number() function: you must have an ORDER BY, but you can use a constant subquery if you don't care about the order.

As you see, most likely there is an error in this INSERT statement: the two date columns have been swapped. This may seem like an error that is simple to catch with a visual inspection. However, imagine that the INSERT statement involves 50 columns and the swapped columns are in the middle.

At first glance, you may think this that this will not compile, but fail due to a mismatch in the number of columns. However, there are two errors in the SELECT statement. Not only is there an extraneous column at the end, but there is also a comma missing after David. You may think that David Eric, i.e. two subsequent identifiers, is a syntax error, but it isn't. This is equivalent to David AS Eric. Thus, there is some chance that the INSERT statement will run successfully, and cause incorrect data to be inserted. Again, imagine an INSERT statement with 50 columns where you may not notice the extraneous column, even less the missing comma.

It seems to me that the INSERT statement was not really designed for industrial-strength programming, but it is the way it is and it will not change. But maybe we could find something within the realm of strict checks to increase our confidence in our long INSERT statements?

A very strict rule, but also largely unrealistic is that column names in the INSERT and SELECT lists must match exactly. That is, the following would be illegal:

I would suspect that a minority of the INSERT-SELECT statements out there complies to this rule. A more realistic rule is: if any source column has an alias, all names in the SELECT list must match the INSERT list. If a column is aliased, it is the alias that must match. That is, the two INSERT statements above would both be legal, but this would be illegal:

One could argue that only explicitly aliased columns should be checked. But once this rule goes into effect and gains popularity, it would surely be considered best practice to alias all columns with a different name from the target column, so inadvertently leaving out an alias is something you want to be alerted of. And more over, once you see that one column has an alias, you can inspect the SELECT query, and do not really have to consider the INSERT list and look for matching errors.

What if a column in the SELECT list is a constant or an expression? Does it need to have an alias? That is, is this legal or not:

INSERT tbl (a, b, c, d)
SELECT a, x AS b, 1, coalesce(d, 0)
FROM src

I have not acquired a firm opinion, but I lean towards that it should be required for expressions but maybe not for constants (since NULL AS col could be perceived as bulky). Thus, the above would be illegal, but the below would comply:

Here I have only covered the INSERT statement, but these pairs of lists that you need to keep aligned appear in several places in SQL:

Definitions of CTEs and views, e.g. CREATE VIEW v (a, b, c) AS ... Although you could argue in this case the column list is optional, so if the programmer leaves it out there is no risk for error. Then again, the programmer may like to add it for clarity. Better then to check it, so that the list does not introduce an error.

Row constructors. This is a feature in ANSI SQL that T‑SQL does not support, but which is high on many wish-lists. With row constructors you would be able to write:

If row constructors are added to SQL Server, the same checks should apply as to the INSERT statement, including variable assignment:

SET (@a, @b, @c) = (SELECT alfa, beta, cesar AS c FROM tbl)

Since one column is aliased, all columns must match the variable names, save the leading @.

The set operators UNION, INTERSECT and EXCEPT. With these, the first SELECT determines the column names of the statement and any aliases in the subsequent SELECT are ignored. The SELECT lists for such queries certainly have potential for alignment errors, not the least if the queries have complex FROM and WHERE clauses, so that the SELECT lists are far apart. Thus, this would be legal with strict checks:

Which is perfectly legal, but of course wasn't what he intended. The @ was a slip on his part. And something that SQL Server could have alerted him about. Generally, while UPDATE permits you to assign variables, it could only make sense
if you assign it an expression that includes at least one column in the target table for the UPDATE. (Else you could
just well use a SELECT.) And even then it's a bit fishy; there are a few cases when you roll your own id columns where
it comes in handy, but they are not that common. So I could even go as far as arguing that variable assignment in UPDATE should not be permitted at all in strict mode. The few cases where it's useful have to be weighed against the many more
cases it's a programming error.

Mandatory Column Prefixes

There are people who argue that all column references in a query should be prefixed, period. I can sympathise with the idea, but I will have to admit that I much prefer the version to the left in the queries below:

The error here is that the subquery has a column from the outer table in the
SELECT list. Perfectly legal, but not that meaningful. More than one have been bitten by this error and posted to SQL forums, thinking that they have hit a bug in SQL Server. They haven't, but
with strict checks we could help them to detect their mistake earlier. I have two
suggestions:

Columns from outer tables must always be prefixed, by alias or table
name.

A column in the SELECT list of a correlated subquery must be an
expression that includes at least one column from one of the tables in the
subquery.

Both protects against the mishap above in the SELECT list on their own, but you can be
coding half asleep and use the wrong alias, in which case the second rule saves
you. The first rule, on the other hand, picks up possibly unintended use of columns
from the outer tables elsewhere in the subquery.

Here the programmer has computed the turnover for a product in a temp table, and returns the data together with som data about the product and the product category in a query. Beside the two Id columns, all column names are unique, and the programmer being of a lazy nature, thinks that he does not need to use more column prefixes than absolutely required.

We can start with the observation that queries like this one are difficult to read for the outsider who don't know the tables. Since the temp table is declared in the same procedure, we can tell where Turnover comes from. But Whizbang? Is that in Products or Categories? While irritating, this alone is not sufficient to warrant any compile-time checks in my opinion. There is however a second problem: this code could fail at some point in the future. One day the DBA decides to add a Turnover column to the Products table. The code above will now fail to compile with

Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'turnover'.

The DBA might be prepared that code may break if he drops a column, but if he adds a column he is likely to have his guard down. The above example is apparently from an outright sloppy and indifferent programmer, but even a good programmer how knows to prefix his column may forget it from time to time.

Therefore it would be a good idea if strict checks would trap column references that could become ambiguous in the future. My suggestion for a firm rule is this one: if more than one table source is visible in a certain place in a query, all columns must be prefixed with an alias or the table name. This rule also covers the situation in the previous section, where there is no risk for ambiguity but well for confusion. To illustrate the rule, here is an example queries where column references what would be errors under strict checks are marked in red:

; WITH CTE AS (
SELECT a, b, c, rowno = row_number() OVER(ORDER BY a)
FROM tbl
)
SELECT e, f, a
FROM CTE
JOIN (SELECT d, COUNT(*) AS cnt
FROM someothertbl
GROUP BY d) AS aggr
WHERE rowno = 1
AND NOT EXISTS (SELECT *
FROM tbl3
WHERE g <> b)

The CTE and the derived table are OK, because there is only table source visible in these. (Inside a derived table, the tables in the outer query are not visible.)

We now enter the area of somewhat more speculative suggestions. We start with looking at cursors and compile-time checks for something it is by its definition dynamic. To wit, despite that the statement reads DECLARE CURSOR, it's an executable
statement, and as a consequence of this, there is no compile-time check
whatsoever of cursors. If you misspell the cursor name in the OPEN, FETCH, CLOSE
or DEALLOCATE statements it will not bite you until run-time. The same is true
if your FETCH statement does not match the cursor declaration.

And of course, in the general case, such checks are impossible. It's
perfectly legal to declare a cursor in one scope and access it from another. But
how common is that? In my experience, a cursor is almost always created and used
in one single stored procedure. The most common exception is probably when you set up the cursor
in dynamic SQL, because your conditions for the cursor are dynamic. Thus, it
seems to be a good trade-off to add checks for cursors in strict mode, and pay
the price of having to add /* NOSTRICT */ in the odd case.

With strict checks in force, SQL Server would extract cursor declarations and
compare these with other cursor statements:

If there are two or more DECLARE CURSOR for the same cursor name, the
SELECT statements must return the same number of columns and the data types
of the columns must match exactly.

The cursor name that appears in the OPEN, FETCH, CLOSE and DEALLOCATE
statements must refer to cursor declared in the same procedure.

In the FETCH INTO statement, the variable list must have the same number
of columns as the SELECT
list of the cursor declaration, and there must be an implicit
conversion from the type of the column in the SELECT list to the type of the
variable. Of course, if the stricter checks for data-type conversions
suggested above are implemented, they would apply here as well.

One step further is to apply the same name-alignment rules for FETCH that I suggested for INSERT in the previous section.

If you use cursor variables, you get the second rule for free, but not the other. The remaining checks could be implemented for cursor variables. However, cursor variables may be passed as parameters to stored procedures, an idea is to leave them unchecked to be a loophole when you want to create a cursor in dynamic SQL.

One more small thing with cursors, although unrelated to the above: it could
be worth considering whether it should be permitted to leave the cursor type
unspecified in strict mode. That is, in strict mode, you must specify the cursor as
INSENSITIVE, STATIC, KEYSET, FAST_FORWARD or DYNAMIC. The default cursor type
today is apparently DYNAMIC, which is a very poor choice. Most of the time,
people probably think in the mind-set of a static cursor. For the same reason, LOCAL would be required. (And GLOBAL an error?)

That is, you could accidently have an extra table in your FROM clause that
formed a Cartesian product with the rest of the query. With the newer syntax
with JOIN and ON, the risk for this debacle is removed since you have to
explicitly specify CROSS JOIN if you want it. Unfortunately, you can still do
this mistake:

SELECT a.col1, b.col2
FROM a
JOIN b ON a.keycol = a.keycol

I don't know about you, but I do it often enough to want SQL Server to tell
me about it.

The basic idea is that in strict mode, the ON clause must refer to the table
source that the ON clause is attached to, and at least one of the preceding
table sources. But we need to consider two complications: nested joins and
multi-column joins before we can make a firm rule out of this.

Nested Joins

It's legal to nest joins. Most often this is done with outer joins. Here
is an example:

The first two both makes perfect sense, and should not result in any error.
The third on the other hand looks spooky. Somehow the table a is left
free-floating. The fourth is a breach against the basic idea, as only one table
is included in the condition. The fifth is an error in SQL Server today: in a
nested JOIN condition, table sources outside the nested join are not visible;
similar to a derived table.

Thus, for the innermost join in a set of nested joins we don't need any
modification of the basic rule. But for the joins further out, "the table source
the ON clause is attached to" refers to the nested join, that is b JOIN c in the example above.

Multi-column Conditions

The next problem to consider is multi-column conditions. This a likely to be
a goof:

It passes the basic rule, so we need to extend it a bit. ON clauses typically
consist of a number of AND factors. So the rule could be extended to all AND factors? Not so fast! This is perfectly OK:

SELECT a.col1, b.col2
FROM a
LEFT JOIN b ON a.keycol = b.keycol
AND b.startdate = @startdate

So the rule needs to be modified to: each AND factor must include a column
from the table source the ON clause is attached to, and either a preceding table
source or a variable/constant. And at least one AND factor must refer to a preceding
table source.

This still renders some legit queries as incorrect in strict mode, for
instance:

SELECT a.col1, b.col2
FROM a
LEFT JOIN b ON a.keycol = b.keycol
AND b.startdate = b.enddate

I have never seen this, but who knows, maybe someone would like to do:

SELECT a.col1, b.col2
FROM a
LEFT JOIN b ON a.keycol = b.keycol
AND @include_b = 1

I contemplated these two cases for a while, and considered variations to avoid that they
would be flagged as errors in strict mode, but I arrived at the conclusion that
these sorts of conditions are rare enough. If the programmer wants to do this, he
needs to add /* NOSTRICT */ to suppress the error message. I think this is OK,
as long the checks more often help the programmer from doing silly goofs than
they get in the way for him.

Here all conditions were plain = and I only talked about AND. There is not
really any difference to other operators. This condition is alright with strict
checks:

What this means can be disputed, but it fulfils the rule we've set up here,
and I see no reason to change it. The same apply to more complex conditions that
include CASE expressions. As for subqueries, if we have:

SELECT a.col1, b.col2
FROM a
JOIN b ON b.col = (SELECT ...)

The subquery must refer to a column from a to be accepted in strict
mode.

The reader may think that OR terms should be handled similarly to AND factors, but OR is so much less common, that I don't think it's worth making any
it more complex by adding rules for OR. Rather OR is best considered to be an operator just like =. So this will pass, even if it's likely
to be a goof:

There are columns from both tables in the single AND factor, so this passes.

WHERE clauses

I don't suggest any particular checks for WHERE clauses. Well, maybe really
obvious things like WHERE a.col = a.col should raise an error, but
as soon as you involve expressions, it gets more difficult to tell the goofs
from the real intentions. For instance, this makes perfect sense on a case-insensitive
collation:

UPDATE tbl
SET col = upper(col)
WHERE col != upper(col)

Unintentional Cartesian products

Of course, it would not be a bad idea to flag this as an error. More precisely, if a table appears in the FROM clause after a comma, but never appears in the WHERE clause this is an error. But I never uses this join style anymore, so for me this is not that compelling. :-)

Here is yet a suggestion of a more speculative nature and which requires further investigation. But I feel that here is a great potential to improving the quality of SQL Server code world-wide by alerting programmers of errors that are due to sloppiness or lack of understanding of key features.

The cardinality errors I have in mind are contexts when at most one row should be returned, but where there is no compile-time guarantee that this is the case. In this section, most of the examples are based on these tables:

Today, the behaviour in SQL Server is that all pass compilation. At run-time, the statements marked 1 completes successfully, however the result is non-deterministic. That is, @b and header.b could be set to any of 12, 14 and 16. The statements marked 2 all result in this error:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery
follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.

Finally, the MERGE statement has its own error message:

Msg 8672, Level 16, State 1, Line 1
The MERGE statement attempted to UPDATE or DELETE the same row more than
once. This happens when a target row matches more than one source row.
A MERGE statement cannot UPDATE/DELETE the same row of the target table
multiple times. Refine the ON clause to ensure a target row matches at most
one source row, or use the GROUP BY clause to group the source rows.

But why wait until run-time? Given the table definitions, we could see with our own eyes that the statements are problematic. On the other hand, we can easily tell that these are safe:

Since there is a primary key on id, the join or subquery on header can return at most one row. So that is the basic rule: when there is a condition which does not include the complete primary key (or any other unique index), SQL Server should raise an error when strict checks are active for:

SELECT statements that assign variables.

Any subquery which is used in scalar context.

UPDATE FROM where any target column could (logically) be updated from multiple source rows.

MERGE statements where the same row could (logically) be updated multiple times.

Some qualifications are in place. In these queries the primary key is implicit in the CTE:

WITH CTE AS (
SELECT id, MIN(b) AS b
FROM lines
GROUP BY id
)
UPDATE header
SET b = CTE.b
FROM header
JOIN CTE ON header.b = CTE.b

The primary key on customeraddresses is (customerid, adrid), so the statement is potentially non-deterministic. However, this query should pass under strict checks if and only if there is a unique filtered index
on

customeraddress (customerid) WHERE isdefaultaddress = 1

since in this case we can deduce that the UPDATE is unambiguous. This could be further extended to indexed views and indexed computed columns, but I leave it to Microsoft to explore that ground.

While this UPDATE statement (logically) hits the same target row many times, this is still deterministic: header.b will be set to 0 for the header rows where there are line rows. So there should not be any compile-time error here, strict checks or not. In the same vein, this is also safe:

However, today this yields the run-time error above, why SQL Server should raise a compile-time error with strict checks to help the programmer to be on the front line. (Or the run-time error should not appear in this situation.)

Before I close this section, I like to make one more comment on exactly how useful these checks could be. A very common error that comes up again and again on the forums is:

For whatever reason, inexperienced SQL Server programmers think that their trigger only need to consider single rows. Maybe because they have not heard of multi-row operations, maybe they come from Oracle where per-row triggers is the norm. Whatever, strict checks would tell them up front that they doing something wrong.

By adding a new feature, in this text called "strict checks"; Microsoft can
help programmers to find silly and stupid errors early, and thereby help them to be
more productive and produce a work of higher quality. In this text I've
discussed a number of things that could be covered by strict checks. I noted
initially, that adding strict checks for some things in one release, and adding
further checks in a later release will cause compatibility problems. Therefore,
there is reason for a big bang. Nevertheless, some priorities may be in
order. By far the most important is the death of deferred name resolution. I have ordered the checks roughly in priority order, but I have also considered the expected difficulty to implement the check. The last item, fixing cardinality errors, looks very interesting to me, but it is also one that requires careful study to get right.

I suspect that Microsoft feels that I have given them more than a mouthful
with this list. Nevertheless, if you have further suggestions, please feel free
to drop me a line at esquel@sommarskog.se.
If I agree with you, I may add the suggestion to the article. If not, you can
always file your own suggestion on Connect,
http://connect.microsoft.com/SqlServer/Feedback.

Added a suggestion for strict checks for missing column aliases when this could cause a compilation error with an ambiguous column if a column is added in the future. Since this an extension of the older check for column names in correlated subqueries, these two have been lumped together in a common chapter.

2012-05-07

Revised the introduction and the General Principles for SQL 2012 and made various small changes here and there.

Added a suggestion that lists must match for INSERT-SELECT and similar context. This suggestion replaces the original idea that column aliases without AS should be disallowed. (I don't believe they would ever do that one.)

Extended the suggestions for UPDATE to a more general section on cardinality errors, suggesting that if SQL Server can conclude that there is a potential cardinality error at compile-time, it should tell you.

I've reordered the suggestions to get an order which is a rough priority order, also taking in regard how speculative the suggestion is.