Let's deprecate UPDATE FROM!

I guess that many people using UPDATE … FROM on a daily basis do so without being aware that they are violating all SQL standards.

All versions of the ANSI SQL standard that I checked agree that an UPDATE statement has three clauses – the UPDATE clause, naming the table to be updated; the SET clause, specifying the columns to change and their new values; and the optional WHERE clause to filter the rows to be updated. No FROM or JOIN – if you need data from a different table, use a subquery in the SET clause. The optional FROM and JOIN clauses were added by Microsoft, as an extension to the standard syntax (and just to make out lives more interesting, they invented different variations of the syntax for SQL Server and for Access). So when you are in the habit of using them, be prepared to review all your UPDATE statements when moving to Oracle, DB2, Sybase, MySQL, or even a different Microsoft database!

Standards? Bah, who cares?

Well, some do. Me for instance – I will never use proprietary syntax if I know a standard alternative, expect if using the latter has severe negative consequences. And maybe you will, one day, when your boss comes back from the golf course with the great news that he managed to convince a colleague (who just happens to work in an Oracle shop) to buy a copy of your company’s application instead of some off-the-shelf product. Or when there’s a great job opportunity for someone with cross platform skills. Or when you are asked to help out this new colleague with 10+ years of DB2 experience. One of the lesser known side effects of Murphy’s Law is that those who least expect having to move their database to another platform, will.

But even if you really don’t care about portability, there are other reasons to be wary of using UPDATE FROM. In fact, the most important reason why I dislike UPDATE FROM is not that it’s non-standard, but that it is just too easy to make mistakes with.

Correctness? Bah, who cares?

Well, most do. That’s why we test.

If I mess up the join criteria in a SELECT query so that too many rows from the second table match, I’ll see it as soon as I test, because I get more rows back then expected. If I mess up the subquery criteria in an ANSI standard UPDATE query in a similar way, I see it even sooner, because SQL Server will return an error if the subquery returns more than a single value. But with the proprietary UPDATE FROM syntax, I can mess up the join and never notice – SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking. And there is no way of knowing which row that will be, since that depends in the query execution plan that happens to be chosen. A worst case scenario would be one where the execution plan just happens to result in the expected outcome during all tests on the single-processor development server – and then, after deployment to the four-way dual-core production server, our precious data suddenly hits the fan…

That’s all?

Well, almost. There’s one more thing. Probably not something you’ll run into on a daily base, but good to know nonetheless. If the target of the update happens to be a view instead of a base table, and there is an INSTEAD OF UPDATE trigger defined for the view, the UPDATE will fail with this error message:

Msg 414, Level 16, State 1, Line 1

UPDATE is not allowed because the statement updates view "v1" which participates in a join and has an INSTEAD OF UPDATE trigger.

Of course, most people will never run into this. But I did have the misfortune of doing so once – unfortunately, I discovered this limitation after rewriting several hundred ANSI standard UPDATE statements to the equivalent UPDATE FROM, and having to convert them all back after as much as a single test…

And that’s why you want to deprecate UPDATE FROM?

Well, no. The view with INSTEAD OF UPDATE trigger won’t affect many people. And the possibility of error can be somewhat thwarted by making sure (and double-checking) to always include all columns of the primary key (or a unique constraint) of the source table. So we’re back to the more principle point of avoiding proprietary syntax if there is an ANSI standard alternative with no or limited negative consequences. And in the case of UPDATE FROM, there are some cases where the standard syntax just doesn’t cut it.

One such scenario is when a file is read in periodically with updated information that has to be pushed into the main table. The code below sets up a simplified example of this – a table Customers, with SSN as its primary key, that stores address and lots of other information, and a table Moved, which is the staging table containing the contents of a file received from a third party listing new address for people who recently moved. I have also included the code to preload the tables with some mocked up data – the Customers table has 10,000 rows, and the Moved table has 3,000 rows, 1,000 of which match an existing row in the Customers table. The others don’t – those people are apparently not our customers.

Since ANSI-standard SQL does not allow a join to be used in the UPDATE statement, we’ll have to use subqueries to find the new information, and to find the rows that need to be updated, resulting in this query:

UPDATE Customers

SETStreet=(SELECT Street

FROMMoved AS m

WHEREm.SSN = Customers.SSN),

HouseNo =(SELECT HouseNo

FROMMoved AS m

WHEREm.SSN = Customers.SSN),

City=(SELECT City

FROMMoved AS m

WHEREm.SSN = Customers.SSN)

WHEREEXISTS(SELECT*

FROMMoved AS m

WHEREm.SSN = Customers.SSN);

There’s a lot of duplicated code in here. And if we were getting data from a complicated subquery instead of the table Moved, it would be even worse (though we can at least put all the duplicated code in a CTE since SQL Server 2005). Of course, writing the code is done quickly enough once you master the use of copy and paste, but the code has to be maintained as well.

Maybe even worse is that the performance of this query just sucks – if you run this (enclosed in a BEGIN TRAN / ROLLBACK TRAN, so you can run the variations below without having to rebuild the original data) and check out the execution plan, you’ll see that the optimizer needs no less than five table scans (one for Customers, and four for Moved) and four merge join operators. And that, too, would be much worse if the source of the data had been a complex subquery (and no, using a CTE will not help the optimizer find a better plan – it just doesn’t understand that the four subqueries are similar enough that they can be collapsed.

Now, if Microsoft had chosen to implement row-value constructors (as defined in the ANSI standard), we could have simplified this to

UPDATE Customers

SET(Street, HouseNo, City)

=(SELECT Street, HouseNo, City

FROMMoved AS m

WHEREm.SSN = Customers.SSN)

WHEREEXISTS(SELECT*

FROMMoved AS m

WHEREm.SSN = Customers.SSN);

But this is invalid syntax in any version of SQL Server (including the latest CTP for SQL Server 2008), and I know of no plans to change that before SQL Server 2008 RTMs.

But with using the proprietary UPDATE FROM syntax, we can simplify this, and get a much better performance to boot. Here’s how the same update is written in non-portable code:

UPDATEc

SETStreet= m.Street,

HouseNo= m.HouseNo,

City= m.City

FROMCustomers AS c

INNERJOIN MovedAS m

ONm.SSN= c.SSN;

And now, the optimizer will produce a plan that scans each table only once and has only a single merge join operator. Some quick tests (with much more rows in the tables) show that it executes two to three times quicker than the ANSI standard version. For that performance gain, I will gladly choose the proprietary syntax over the standard!

What’s with the title of this post then? Why deprecate a fine feature?

Patience, we’re getting there. Bear with me.

All the above is true for versions of SQL Server up to SQL Server 2005. But SQL Server 2008 will change the playing field. It introduces a new statement, MERGE, that is specifically designed for situations where rows from a table source either have to be inserted into a destination table, or have to be used to update existing rows in the destination table. However, there is no law that prescribes that any MERGE should always actually include both an insert and an update clause – so with this new statement, we can now rewrite the above code as follows:

MERGEINTO Customers AS c

USINGMovedAS m

ONm.SSN= c.SSN

WHENMATCHED

THENUPDATE

SETStreet= m.Street,

HouseNo= m.HouseNo,

City= m.City;

As you can see, the source table and the join criteria are included only once, just as in the proprietary UPDATE FROM. The execution plan (tested on the February CTP, also known as CTP6) is also quite similar, including just a few extra operators that are specific to the new MERGE statement. What really surprised me, was that the plan for the MERGE statement was estimated to be about 65% cheaper (faster) than the corresponding UPDATE FROM statement. However, I think SQL Server is lying here – a quick test with more data shows only an extremely marginal advantage of MERGE over UPDATE FROM. This test was too limited to draw serious conclusions, but I am quite sure that there will not be a 65% saving by using MERGE over UPDATE FROM. (I do expect such a saving form either MERGE or UPDATE FROM over the ANSI-compliant UPDATE statement for this case).

The good news is that:

1)The MERGE statement is described in SQL:2003 and can thus be considered ANSI standard. (In fact, SQL Server implements a superset of the ANSI standard MERGE syntax: everything described in the syntax is implemented, but there are some non-standard extensions that make the command even more useful as well. However, the example above uses only the standard features and should hence run on each DBMS that conforms to the SQL:2003 version of MERGE).

2)The MERGE statement will return an error message if I mess up my join criteria so that more than a single row from the source is matched:

Msg 8672, Level 16, State 1, Line 1

The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

3)The MERGE statement will gladly accept a view with an INSTEAD OF UPDATE trigger as the target of the update.

So as you see, MERGE allows me to achieve what I previously could achieve only with an ANSI standard UPDATE statement with lots of duplicated code and lousy performance, or with a UPDATE FROM statement that hinders portability, introduces a higher than normal risk of errors going unnoticed through QA right into the production database, and has some odd limitation on views with INSTEAD OF UPDATE triggers. None of these downsides and limitations apply to MERGE. And if there are any other problems with MERGE, I have yet to find them.

With this alternative available, I fail to see any reason why the proprietary UPDATE FROM syntax should be maintained. In my opinion, it can safely be marked as deprecated in SQL Server 2008. It should of course still work, as “normal” supported syntax in both SQL Server 2008 and the next version, and in at least one version more if the database is set to a lower compatibility – but it should be marked as deprecated, and it should eventually be removed from the product. Why waste resources on maintaining that functionality, when there is an alternative that is better in every conceivable way? I’d much rather see the SQL Server team spend their time and energy on more important stuff, such as full support for row-value constructors and full support for the OVER() clause. Or maybe even on releasing Service Pack 3 for SQL Server 2005!

My main reasons for suggesting to deprecate UPDATE FROM can be summarized in short as:

1) It's not part of the ANSI standard, nor (as far as I know) part of any other RDBMS;

2) Its results are undefined in some situations; and

3) There is (in SQL Server 2008) an alternative that does everything UPDATE FROM can and then some, without incurring the two previously mentioned problems.

Since you consider multi-row triggers a different side of the same coin, let's see how these three main points relate to them:

1) The ANSI standard describes both statement-level and row-level triggers. If no "FOR EACH [ROW | STATEMENT]" is specified, the trigger has to default to statement level - what you can a multi-row trigger. Granted, SQL Server does not exactly follow the ANSI standard syntax for triggers, but statement-level (multi-row) triggers are definitely ANNSI standard.

2) The results of multii-row triggers are defined very exactly, and in all situations. Whenever an insert, update, or delete is executed, the trigger will fire, with all rows affected in the "inserted" and "deleted" pseudo-tables. This is documented and described ad nauseam, and I've never seen anything unexpected happen WRT trigger execution.

3) One might argue that the ANSI standard still offers row-level triggers as a valid alternative for statement-level triggers. But since SQL Server does not implement them, there is (as of SQL Server 2008) no replacement for statement-level triggers. Deprecating them would leave us without any triggers at all! Definitely not a good idea.

After reading your blog post on this subject, I can only conclude that you are good at stringing together words that at first sight seem to make sense, but on second sight reveal your absolute lack of comprehension of the subject. Which is a shame, for I really believe that Dataphor deserves a better proponent.

What about DELETE FROM (with JOINs)? Not as common as UPDATE FROM, but can also be replaced by MERGE.

As a side note, I agree that standards and sticking to them is important, but at least in 50% of the time it is more beneficial to a project that the code works in several versions of the same product than having the code work cross-platform. In these cases even if row-value constructors were introduced in 2008, I'd use UPDATE FROM in my code, for backwards compatibility.

> Just to make sure you don't misunderstand me: I don't suggest deprecating

> UPDATE, it's only the FROM clause (and thus the possibility to join) in the UPDATE > (and DELETE) statement that I'm targetting.

Thanks for the clarification. I did understand you, but made a mistake in my post and it was to late to fix it.

I remember from the request sent by Itzik Ben-Gan and Sujata Mehta, about the full support for the OVER clause, a section talking about "Vector expressions". In the document they suggest the implementation of "Vector expressions" to avoid declaring the same OVER clause in miltiple expressions. In the connect entry, the request changed to use the SQL:2003 WINDOW definition, which is more elegant regards OVER clause.

Thanks for creating the connect entry. You got my vote.

I hope Microsoft put more attention to the requests coming from the SQL Server community.

I actually intended to include some comments on DELETE FROM, but when I noticed that I already was well over what I regard to be the maximum lengthh for a bllog post, I changed my mind. Forunately, you are now giving me a chance to make up for that.

DELETE FROM should never have been supported at all. I have never ever seen a query with DELETE FROM that could not have been written just as well with DELETE WHERE EXISTS. The only thing it has going for it is that its inclusion is congruent with the inclusion of UPDATE FROM -- so when I suggest deprecating UPDATE FROM, it'll come as no surprise that I believe DELETE FROM should be removed as well.

By the way, the connect entry I opened after Alejandro suggested me to does in fact mention DELETE FROM as well.

>"you are good at stringing together words...but reveal your absolute lack of

>comprehension"

Well, you have a curious form of courtesy to those coming to your home:)

>>"You might as well ask them to deprecate their multi-row triggers":)

>"...there is (as of SQL Server 2008) no replacement for statement-level triggers. >Deprecating them would leave us without any triggers at all! Definitely not a good idea."

I confess to the mistake of again underestimating the amount of sarcasm needed to make a point (drip,drip:). Shouldn't you save such a precious commodity like scorn for a serious matter? I resent being called an empty suit for poking fun at sql:) Although I don't really see anything wrong with a connect candidate that argues for a redesigned trigger based on common sense considerations. Really, all I tried to do was bring attention to two constructs that share "multirow" considerations that don't make the lives of developers any easier. Both require an unwarranted intervention. To paraphrase you, an undefined update is "congruent" with a multirow considered trigger. As Hugo Igo, no?:) Speaking of standards, do you have two?:)

I'm quite sure we're talking about a moot point here as far as change. What would be interesting would be their "real" rationale. I wouldn't be surprised if it didn't include the usual suspects.

>"...I really believe that Dataphor deserves a better proponent."

Now you're talking about a serious matter. I would gladly concede defeat in this battle to win a much wider war:) Can I finally compliment you for seeing the bright light? A relational form of update and a relational trigger are quite different beasts compared to their sql counterparts. We are talking about different considerations. I look forward to your taking up the challenge to lead sql folks to a relational system. And succeed where I have failed!

I tend to disagree about your ability to string together words. Steve, I found your blogs pretty hard to follow.

There's a lot to be said about working with the standards we have now. Steve, do you use a Dvorak or a QWERTY keyboard? Just because we may know that there's something better out there doesn't mean that we can or should dump what we have now, stop extending and polishing it, and start with what's "better". SQL is a lot like that. It may have cruft, but it's the lingua franca for data manipulation right now.

Hugo, I think UPDATE FROM should be deprecated when we have a standard syntax which performs as well and is as understandable as what we have now... But if they can't deprecate DELETE FROM, do you think there's ever going to be a chance that UPDATE FROM will be?

I was unaware of the potential for undefined results. Data integrity is vital - enough for me to change my coding habits from Update From to Merge in SQL Server 2008. I'm sold on Merge based on your post.

Deprecating features has to be a tricky, touchy subject. There's stuff in SQL Server I'd like to see go, but it turns out I am accurately representative of precisely one user(s). I don't know how Microsoft decides on what stays and what goes. Based on history, I would guess ANSI compliance isn't at the top of the list.

And using Merge isn't really an option for a lot of people since they have to support mssql2005 for probably at least another 2-3 years (and what about other vendor's dbms ... not that you could use update from anyways...)

>>you have a curious form of courtesy to those coming to your home:)<<

Hehe! The comments sections of my blog, especially after a deliberately provocative post, should be compared to a debating club, not to my home (though my home sometimes is hard to distinguish from a debating club).

As far as I'm concerned, everyone is free to speak their minds here - I have so far only removed comments that were pure spam (though I will also remove comments with high volumes of profanity). Yes, I can be harsh - if I think someone is wrong, I will say so in very clear words. And I expect no less from my readers when they see me making a mistake!

>>Although I don't really see anything wrong with a connect candidate that argues for a redesigned trigger based on common sense considerations<<

So far, I've only seen you claim THAT SQL Server's triggers are badly designed, with insufficient backing. I've seen no proposed "common sense" replacements. But do let me know once you have the suggestion filed on Connect - I'll gladly read your suggestions, and if I agree that you have found a better way, I'll vote for it.

>>Really, all I tried to do was bring attention to two constructs that share "multirow" considerations that don't make the lives of developers any easier. Both require an unwarranted intervention. To paraphrase you, an undefined update is "congruent" with a multirow considered trigger<<

Paraphrase? It looks more like an attempt to put words in my mouth. I don't see any congruence between the two.

UPDATE ... FROM can cause a situation where two or more values are selected to be put into a single value - and there's no way of predicting which one will make it. That's why Books Online calls the behaviour UNDEFINED.

Triggers will fire once per statement, so that the trigger code has to be written just as set-based as all other SQL. Nothing unpredictable, nothing undefined - but for some programmers (a.o. those with Oracle background, where row-based triggers were for a long time the only option), this comes as a surprise. That's what the "consideration" in Books Online is all about - I'd personally rather call it a heads up or a warning.

The behaviour of triggers is correct and is defined in all situations, but some developers need a heads up. The behaviour of UPDATE FROM is undefined in some situations. You really can't compare those in any way.

re Dataphor - I won't be its proponent, sorry. It is a product I would like to get to know a bit better if I could spare some time - but I am currently trying very hard to squeeze 28 hours into each day, and it doesn't appear to be working. But I do hope that others, with more time available, will explore it - and I'm afraid that your anti-SQL crusade accomplishes rather the reverse - expecially when you use far-fetched arguments such as these. Come on, there are enough REAL problems in SQL Server, you don't have to clutch straws!

@Dave:

>>But if they can't deprecate DELETE FROM, do you think there's ever going to be a chance that UPDATE FROM will be?<<

I was going to reply that I never really expected much from my suggestion on Connect, but I just got notice that a comment has been added on Connect - apparently, the SQL development team had already planned to look into the FROM clause for UPDATE and DELETE and will now take this suggestion as another consideration for the equation.

Ugh, that sentence didn't come out as intended. And it sounded so great when I heard it in my head (but that was in Dutch, unfortunately ;-))

@Andy:

Glad to hear that you have decided to shy away from UPDATE FROM in the future.

>>I don't know how Microsoft decides on what stays and what goes. Based on history, I would guess ANSI compliance isn't at the top of the list.<<

I'd love to know, but I don't expect them to share. It seems as if there are multiple camps inside Microsoft, as you sometimes see new features that closely follow the standards (MERGE is a great example - the "extra" feratures are implemented in such a way that ANSI standard MERGE can be parsed and executed without any change), and then you see things going right against the standard (for instance, even though Books Online urges us to use ROWVERSION instead of TIMESTAMP for the ROWVERSION data type, the SQL generated from a table will actually use TIMESTAMP).

@Matt:

>>what do you think of the oracle syntax (...) is this standard?<<

No. The late draft of the SQL:2003 standard that I have (check wikipedia for the download location, if you're interested) has this specification for UPDATE:

<update statement: searched> ::=

UPDATE <target table> [ [ AS ] <correlation name> ]

SET <set clause list>

[ WHERE <search condition> ]

>>safe?<<

Dunno. I've never had a chance to play with Oracle, and I was unaware of this syntax. But test for yourself - create two sample tables using the standard OrderHeader / OrderDetail scheme and then try to use this syntax to set a column in the OrderHeader table equal to one in the OrderDetail. This should fail (since you have to specifiy which of the different OrderDetail rows to use). If it works, it's just as unsafe as SQL Server's UPDATE FROM.

>>if so why doesn't sqlserver support it?<<

I don't think they'll beat Oracle by copying its features. They have to try to stay ahead.

Anyway, since we've got MERGE now, there is no longer any reason to want them to copy this. Right?

>>And using Merge isn't really an option for a lot of people since they have to support mssql2005 for probably at least another 2-3 years<<

That's why I suggested deprecating it first, and removing it after a couple of versions. So that people have ample time to fix their code.

I did a quick test (not using orderheader/orderdetail, just some schema I made up)

Updating the inline view fails with an error when the update isn't deterministic:

ORA-01779: cannot modify a column which maps to a non key-preserved table

I joined two tables t1 and t2 .. If I updated a column on t1 with a column on t2, I got the error unless the column I joined on was unique in t2 (I had to create a unique index .. the mere absence of duplicate values without a unique index still resulted in error) .. So the update of an inline view seems safe but nonstandard.

I checked the manuals and oracle does support update from. Using the join that generated the ORA-01779 error using the UPDATE ( view ) syntax, merge generates

ORA-30926: unable to get a stable set of rows in the source tables

When I added the unique index, merge performed as expected. I was able to test oracle 11.1 and 10.2, but I didn't have a 9.2 instance handy.

I did notice in comparing the oracle and sqlserver docs that sqlserver there is some variation in the supported syntax. They both support WHEN MATCHED, but sqlserver supports WHEN MATCHED AND, they both support WHEN NOT MATCHED but oracle doesn't support AND and TARGET can't be specified. SqlServer supports WHEN SOURCE NOT MATCHED that Oracle doesn't seem to support at all.

In any event, good blog entry -- its got me thinking and made me reconsider merge over update from and double check what oracle supports (my company supports both sql and oracle) .. Once SQL2008 is the min platform, we should be able to use standard sql across both platforms instead of sqlserver's update from or the oracle's update of an inline view.

in the meantime, at the very least shouldn't sqlserver be generating an error if the update from isn't deterministic?

>>I did notice in comparing the oracle and sqlserver docs that sqlserver there is some variation in the supported syntax<<

Correct. From what you write, it seems as if Oracle is supporting ANSI standard MERGE. SQL Server does so as well - but it has also added extra features and possibilities.

Fortunately, they have chosen the syntax so that the plain bog-standard ANSI syntax for MERGE is a subset of the full syntax, so any ANSI-compliant MERGE statement should work without change in SQL Server.

>>in the meantime, at the very least shouldn't sqlserver be generating an error if the update from isn't deterministic?<<

I would prefer that to the current behaviour, but I don't think they'll change it - it would potentially break too much existing code. If they are prepared to do that, they might just as well pull the feature completely (and based on the comment on Connect, it appears as if they are indeed giving this serious consideration).

<quote> Just because we may know that there's something better out there doesn't mean that we can or should dump what we have now, stop extending and polishing it, and start with what's "better".<unquote>

To quote Dalton (Patrick Swayze) from 'Road House', "I get that alot". In case you haven't heard the Cloverfield of software is also offering an alternative, ie. the object model with LINQ/EFM. I assume you have the same sediment to MS:) Perhaps you don't like my way of courting you. I'm not bashing sql, I'm simply pointing out the holes in it and showing how the relational model fills them in. Is it irritating, probably so. There's always a price to be paid for examining a comfort zone. Of course MS has taken quite a different tack. You don't see MS evangelists getting into the face of sql folks do you. You don't see them directly addressing sql folks with the reasons why they should leave sql and work with an object model. Nope, they are more than content to leave sql folks alone. LINQ is really not targeted here because MS does not really care whether sql folks jump ship. You are expendable:) And they don't take the risk of getting your ire up and alienating you. It's quite an interesting strategy they have. Sql folks foster a product that is going in the opposite direction of their intentions without any opposition:)

>Steve, I found your blogs pretty hard to follow.

Well the relational model is a 'real' theory. Dataphor is not trivial, you gotta use your head to think where you haven't had too in sql. No pain, no gain:) As the saying goes there is no such thing as a small logical difference. They are all monster differences.

>Sql is "the lingua franca for data manipulation right now."

Yep, for a large variety of reasons amongst which an intellectual one is not among them. Users fear vendors but that's backwards. I'm hoping at some point the simple consent for sql as 'the' thing for data manipulation melts and coalesces around something else (just as MS does).

>I found your blogs pretty hard to follow.

That's because you're used to the heavy lifting in BOL:) The "how" of syntax is easy, the "what" and "why" of it is not. The subject matter is not always obvious and to be entertaining at the same time is no easy feat. Go double or nothing, give me a second read:)

Hugo said:

As far as triggers go "...do let me know once you have the suggestion filed on Connect"

I think there's already too many users who are patronized. I'll stick with criticism:) The simplest common sense consideration for a trigger would be for the system to understand a 'row' as a type. As far as sql case closed:)

>I'm afraid that your anti-SQL crusade accomplishes rather the reverse - expecially >when you use far-fetched arguments such as these.

Of the many arguments I've made against sql you're sentencing me on just this one? That hardly seems fair:)

>Dataphor - I won't be its proponent, sorry. It is a product I would like to get to >know a bit better if I could spare some time...

Isn't that excuse on the same level as "I was just following orders"?:) Do you really want to leave pioneering work to others...like me?:)

I expected as much. Criticising is easier than specifying an alternative.

>>The simplest common sense consideration for a trigger would be for the system to understand a 'row' as a type.<<

Why? Coming from someone who claims SQL Server is not relational enough, this is a surprising statement. The relational model is about sets. SQL Server is (mostly) about sets as well.

Every serious SQL Server developer should learn to think set-based. And once you have mastered that, coding row-based triggers seems like a step backwards to me.

>>Of the many arguments I've made against sql you're sentencing me on just this one? That hardly seems fair:) <<

I'm singling out this one, because this one happens to be made on my blog, as part of an attack of a view I presented.

On other places, I have learned to mostly ignore your posts.

>>Isn't that excuse on the same level as "I was just following orders"?:)<<

No. "I was following orders" puts the blame on someone else. I take full responsibility for choosing to spend time on stuff I consider more important. That's my choice, and you're free to blame me for it. :)

>>Do you really want to leave pioneering work to others...like me?:)<<

Yes. At least, *this* pioneering. I choose to spend my energy pioneering in different areas, where I think more progress can be made.

I use UPDATE FROM regularly. Result are AFAIK never undefined. Predictability of the statements are responsibility of developer. MERGE can replace it and do more, however, a lot of customers still use sql server2000/2005, so if they upgrade in a future to a version not supporting UPDATE FROM, I'd have to rewrite a lot of triggers and stored procedures working perfectly well.

If you know the data model well and how to properly join tables, you know how to write a predictable UPDATE FROM.

Do you blame a gun, when you shoot yourself in the foot? Usually not.

It should return an error in case of unpredictable result, but I'd never agree on removing this feature. The error would not break much existing code as I don't thing there are many misusing this. Also, bad UPDATE FROM sends trash to triggers.

About triggers. It would be nice, if we had BEFORE triggers:

- if it rejects the update, no update takes place, if after trigger does it, the client must rollback the transaction. What if it's not a well behaving client? There are INSTEAD OF triggers, which can do the job, but they're not the same thing and have a lot of limitations.

- it can modify data without performing an update (update the "inserted" table instead of actual table)

Into the same basket would go a request for row level triggers. Most of things can be done with statement level triggers, but sometimes a set based solution is not possible, so you have to use a cursor.

Updates that are deterministic!?! What an idea. I'm having a hard time convincing people that creating tables without a unique constraint is a bad idea. (OK, that's false. They agree that it is a good idea and maybe sometime in the future they may consider...)

I agree that deprecating "update from" in version 2010 (or whatever after SQL 2008) is a good idea but there are companies that run SQL 6.5 now. When would be a good time for commercial software suppliers to consider replacing with the "MERGE INTO" commands? Major companies are just now converting from SQL 2000 to SQL 2005 and they definitely aren't implementing sweeping changes in their code to take advantage of 2005 changes.

The only people who should consider using "MERGE INTO" now have control of their SQL driver and know it is not going to change into a non ANSI 2003 standard driver in the future. Hmm isn't that one of the reasons why we shouldn't use "UPDATE FROM"... we don't control our SQL driver?:)

>I expected as much. Criticising is easier than specifying an alternative.

When I said "I think there's already too many users who are patronized. I'll stick with criticism" I meant I would rather have someone criticize me for a point of view I expressed (like here) than be patronized based on a submission on Connect. And I thank you for reinforcing my decision:)

Lets look at the sql server big picture of the associations between Update, triggers and Match. In 'Multirow Considerations for DML Triggers'

Now if someone asks the question 'what is the optimal trigger to store a running total' the answer is the bedrock that sql rests on, 'it depends'. That's what a multirow consideration is, a dependency. There's no definition of a trigger independent of the data. And if nothing else this consideration results in more code than if the trigger was based on a row (ie for each row). More code runs contrary to the idea of parsimony of effort, where less is better. Now ask the simplest question of Update, "what can you update" and the sql answer is "it depends". It depends if it's an expression(no), a table(yes), a view with a single table (possibly), a view with multiple tables(no) etc. It's a relational idea that Update can be used with anything that the system recognizes as a table. Therefore there's no consistency to the sql Update. The motivation of Match is simply a way to overcome this inconsistency in sql. The idea of using Match to drive an insert or an update is something a relational system does implicitly. Sql cannot do the same because it requires an explicit concept of "keys" which is a different animal from sql indexes. But what is the price of Match? It too runs contrary to parsimony of effort.

This idea of parsimony is important. It's a central theme in MSs push for the object model (LINQ/EFM) and declarative programming. They are convinced that development can be done in a simpler way than sql. Perhaps this explains their patronizing on Connect:) I'm afraid MS no longer has their heart in sql. If I can only convince them that the relational model also is an example of "less is more":)

I think you should read up on your transact-sql history. I used Sybase back in the early nineties and it had support for UPDATE..FROM, so I don't think this was something M$ added. They rather kept what was already there.

Thanks for the history lesson. Even though I have actually worked with SQL Server 4.2 (which, IIRC, was still based on the common codebase), I never realised that UPDATE FROM existed back then. I always assumed that MS added it when they rewrote the entire codebase - but I never went the extra mile to actually check it.

Another general consideration is performance vs. portability. Various RDMS have their own flavours of SQL. So do you choose the lesser performing syntax in order to support ease of portability? Or do you code for each RDMS you choose to support with your application, but have optimised versions for each RDMS so that it performs the best it can for a given platform?

Rhetorical question, I'm not looking for an answer to it. It's six of one, half a dozen of the other. Or, as is more commonly heard in DBA circles, "it depends".

I guess, if the would deprecate update from, they would end up providing a "hotfix" with the need of a startup parameter to revert to the "old" behaviour. Cfr "views with order by" with the results actualy being ordered as defined in the view.

Thank you for making me aware of the "messed up joins" case. I've never run into this until now - I use the "source table" primary key for joining and I test the result. But from now on I'll pay more attention...:)

Unfortunately, the subqueries you suggest require a tablescan for each. Lets say you are migrating data stored in a vision file format that you have to convert to line sequential format then import that data into a sql server db table. It's much easier and more efficient to use update from when dealing with 2 tables of 15 million records and 1 table of 7 million records to update from when they have character based key fields to combine that information in a relational format over 5 different tables. Perhaps you are correct with what you are saying but if no one pushes the envelope on standards then the implementations stagnate.

Your statement, "SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking" is not true

Thanks for your comment. You are right, insofar that indeed SQL Server will not physically update a row, write out the new value, then update it again. It will, however, read and process all matching rows in the source, changing the in-memory version of the "new" row on every iteration, before finallly writing out the final version.

The point I was making was not about performance (which it would have been if indeed each row was physically updated multiple times), but about unpredictable behaviour - since if the execution plan changes, so does the order in which rows are processed and hence the value that just happens to be "last".

If you copy the code below and paste it in SSMS, you'll see what I mean - just adding an index changes the effect of the UPDATE. (And if you run with "Include Actual Execution Plan" enabled, you'll also see that indeed *all* rows from the source table are read.

We use SQL Server, PostgreSQL and MySQL most of all the databases. All support some variant of UPDATE FROM. Granted its annoying they do it slightly differently. So UPDATE FROM even though not ANSI standard is not some weird thing concocted by Microsoft or Sybase.

It would be nice if all databases that support it do it the same way but I tend to find it more efficient and intuitive than any other way. I tend to like PostgreSQL implementation the most.

UPDATE sometable

FROM someothertable

WHERE sometable.someid = someothertable.someid;

There are also things such as Information_schemas - which are ANSI standard and supported by SQL Server, PostgreSQL, and MySQL (neither Oracle or IBM support those - so so much for standards)

Assuming you're using SQL Server and not any other database, I see no errors in the code you posted.

I suggest that you find a SQL Server forum or newsgroup and repeat the question there. And in order for others to help you find the problem, you best include table definitions (as CREATE TABLE statements) and some rows of sample data (as INSERT statements).

1. Books online says that MERGE "may produce incorrect results" if a cte is used to filter out rows from the source or target, so in that case you do not end up with an assurance that there hasn't been a mistake.

2. To me, a lot harder on the eyes than UPDATE/DELETE FROM if all you are doing is updating/deleting. Also takes longer to write.

3. There are other caveats with filter criteria (besides cte warning, shouldn't be done in the ON clause), which leaves the WHEN MATCHED clauses, sometimes inelegant, or, adding derived tables with select & where clauses (or adding filters on the join clauses when no select). in contrast, update/delete from has a natural place for filter criteria, the traditional WHERE after FROM clause.

4. Of course merge is uniquely useful and powerful, & can do "upserts" etc., but that means it also has more subtle structure and considerations overall (e.g. merge is one of the few times that BOL mentions anti semijoins)--not a golden bullet that protects you from errors. the syntax and structure is new, more of a break from older syntax than one would guess at a quick glance.

5. One could simply add the same runtime error as MERGE has to UPDATE/DELETE FROM statements, with option to turn off or on (erland somm.'s suggestion).

Who said that any of the SQL Server 2008 new SQL stuff was any ANSI SQL Standard? MERGE INTO, please, since when is that 'standard'?

..... We've been coding on SQL Server 6.5 through 2008 R2 AND......Oracle 7.3 through XI using the SAME SQL CODE..... (we used to run the same code on Borland Interbase but that went the way of the DoDo bird...).

Nice replacement of UPDATE .. FROM by MERGE. But beware, SQL Server heavily extends the SQL:2003 / SQL:2008 standard MERGE clause. You might run into problems yet again, when you want to run your SQL on these databases taht support MERGE

- DB2

- HSQLDB

- Oracle

- Sybase

Or on any other ones (including MySQL, Postgres), that don't support MERGE

I have grown up with SQL Server technology (since 2001) and I was not aware (and shocked to hear) that an update on joined tables was not defined in the SQL standards.

It probably means that if I need to update data in a table with values based on fields in other tables, that I need to use a subquery in the WHERE clause? That moves this discussion to the use of subqueries vs. joins. Probably with the same conclusion: the choice for subqueries or joins may depend on the specific data structure and data population of the database. Sometimes a subquery gives better performance, and sometimes a join does.

I definitely want to keep the choice for using either subqueries or joins. Yes, also in UPDATE statements. So I will definitely NOT follow SQL "standards".

Apart from that, I see no issues with NOT complying with SQL "standards".

During my entire career, I have NEVER encountered a single project with end-user requirements for supporting multiple DBMSes from the same business logic (except distributed heterogenous database layers, when two or more different DBMSes are used TOGETHER and have their own task for storing and serving SPECIFIC data). So I personally NEVER needed to strictly adhere to the SQL "standards".

So I am convinced that with real-life database application development, you will just need to write multiple DBMS-vendorspecific data layer functionality when you need to support different DBMSes. In each distinct data layer implementation, you will be able to use any standard or non-standard SQL commands to manipulate your data.

IMHO, this will simply stay this way until the SQL "standard" has matured enough (or until it has died a painful death). Just compare it with the development of HTML and CSS up to their current standard (HTML 5 and CSS 3)...

Cheers, and happy developing!

November 13, 2014 1:41 AM

Leave a Comment

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.