SQL Server NOLOCK Hint & other poor ideas.

Frequently I see production code, created by professional development teams, peppered with NOLOCK & other TSQL Hints.

While totally understandable, as it is a common recommendation by many internet posts & often found in their sample code, this is a really bad practice. It often results in very obscure, hard to reproduce bugs and can cause data to get corrupted.

Thought 1: TSQL HINTs in General

As a general rule TSQL HINTs should only be used as a last resort. Both those responsible for the ANSI SQL standard & the Microsoft SQL Development Team have given a lot of thought about what is the safest, most desirable default values for transactions & query execution. It would seem logical that your default Coding Standard should be to follow those defaults and not some code snippet you found on the internet.

Recommendation 1: Do not use any HINTs until your testing proves that you have an issue that can’t be solved any other way than by using a HINT.

Be aware that any testing you do will be unique to that Specific Edition, Specific Version & Service Pack. The optimiser is constantly being enhanced, in a future release it might change to better handle whatever it is that you are hinting.

On more than one occasion I’ve had customers request a switch to turn off the Optimiser HINTs generated by some s/w package they’ve purchased. They’ve discovered that the ISV’s queries actually run much faster without the hints, perhaps the hints were useful 10 years ago is say, SQL 6.5, but are now a hindrance in a later release of SQL.

Often you can rewrite your query &/or modify schema to get a much better result.

Recommendation 2: If you use a HINT, prove it via testing & document it.

If you use a HINT, document why. I’d expect at least :-

The issue or performance problem you encountered, How it worked without HINTs & how it worked with the Hint.

The Version, Edition, Patch Level you tested it on. (Enterprise Edition runs many more operations in parallel than Standard, this can make a difference)

Also nice if your app provides a configuration option to remove it your hints OR maybe give the customer the ability to edit your stored proc to remove it.

More than once I’ve looked at a Schema or TSQL Query & thought “Either this designer was brilliant & had such foresight to anticipate some obscure issue I’ve not even considered OR they have no clue about databases”. Unfortunately it is almost impossible to know with total certainty. So please tell the poor mongrels who maintain your code, what you were thinking. Preferably put the comments into the TSQL Code, as nearly any other place the documentation will become separated from the code.

Thought 2: TSQL NOLOCK / READ UNCOMMITTED HINT

This Hint is much more dangerous than its name suggests. And that it why most people who don’t understand the problem, recommend it. It creates “incredibly hard to reproduce” bugs. The type that often destroy your end-users confidence in your product & your company.

But it does make nasty warnings/errors go away without the need to really fix the problem. Similar to short sighted “tips” from other disciplines are :-

Turning off the Compiler Warnings about Implicit Type conversion can speed development. NB: Comparing an INT to a SHORT is classic cause of an infinite loop.

Turning up your car stereo can drown out the grinding noise of an engine with no oil.

Turning up the volume of your MP3 player can save you from the terror & desperate leap to safety when you walk onto the road with your back to the oncoming traffic.

Folding your arms on the “seatback in front of you” will make a big difference when you fly into the ground upside down at 900 kph. OK there may be some merit to this one, stops you annoying the person next to you.

What many people think NOLOCK is doing

Most people think the NOLOCK hint just reads rows & doesn’t have to wait till others have committed their updates or selects. If someone is updating, that is OK. If they’ve changed a value then 99.999% of the time they will commit, so it’s OK to read it before they commit. If they haven’t changed the record yet then it saves me waiting, its like my transaction happened before theirs did.

The Problem

The issue is that transactions do more than just update the row. Often they require an index to be updated OR they run out of space on the data page. This may require new pages to be allocated & existing rows on that page to be moved, called a PageSplit. It is possible for your select to completely miss a number of rows &/or count other rows twice. This has been well documented by a number of highly reputable sources, including the SQL Server Development team. So I wont repeat it here. For the details visit the links below.

We always do that / It was just in the sample code I read. => Now you know better.

But I keep getting these Deadlock messages. => Solve the problem (see below) don’t just ignore the warnings.

I only use it on small tables which heaps of people read & it rarely changes. => Shared Locks don’t block other shared locks. Maybe you could have your Updates change 1 row per transaction.

The solution

Unfortunately there is no 1 line recommendation I can give you that will make your Deadlock &/or other perf issues go away. People have written entire books on the subject. But a few things to consider are:-

Use Stored Procedures for everything. They provide a level of encapsulation or code isolation that will allow someone to change your schema & fix perf issues without needing to understand your code. Note: Creating 3 stored procs for every table; p_Insert, p_Update & p_Delete is OK but not really what I’m talking about. I prefer procs that do a unit of work that may involve multiple tables, eg: p_CreateNewCustomer() or p_NewOrder()

Improve your schema Keep your rows short – avoid adding additional audit columns (ie: LastUpdatedBy) if there is no business plan to read them. Ensure your Many to Many tables use both foreign keys as a the unique composite key (Primary key), ie: an Identity Column as a Primary Key is not appropriate for these tables.

Keep your transactions short Avoid having a huge Selects sandwiched between two updates in the same transaction. (this is like Loop optimisation by taking the invariant statements out of the loop). Avoid SELECTing a row & then changing it, eg: Instead of SELECT VALUE, UPDATE VALUE = OLD VALUE+1. Just Update the “Original value +1” as a single statement.

Avoid using cursors. Cursors are not bad but often you can find a much more efficient way to complete a task. The optimiser is forced to do “Row by Row” changes which prevents most forms of optimisation, parallelism & multi-buffered operations.

Try to acquire locks in the same sequence for all your transactions. eg: have all your stored procedures Lock the InvoiceHeader before the InvoiceDetails. Don’t write half one way & the other half the opposite.

Think about using a READPAST hint. If a row or page is locked you just don’t read it. Often that is OK as those “Rows” were being used anyway. eg: in any ticketing system; Airline seats, Theatre, Hotel, where you have many customers competing for finite resources, you often know the total resources anyway. In one system we drew all the seats in a concert hall & coloured them orange. Then Selected all the seats on that night with a READPAST hint. Those that were RESERVED or AVAILABLE were returned. We coloured them appropriately. Any seat not returned was possibly being locked by another reservations clerk, so remained in Orange. If they hit refresh again, these seats typically turned Red (booked)

SQL Broker Use SQL Service Broker to break your transactions up into an async component. These smaller transactions might not block for as long OR you may be able to block the queue in heavy load periods & have that part of the transaction processed in the evening.

Update to this Post

1. If NOLOCK is so bad why have it at all?

Because sometimes accuracy is not so important. If you are plotting something on a line chart & the line is out by 1-2% you possibly wouldn’t notice, what most people would look at is the trend. Similarly, if you are putting a number on a screen & refreshing it periodically (ie: %CPU or “Orders received in the past 10 mins”, most people will only look at the 1st 2-3 significant digits. eg: If a traffic counter measured 154,218 cars went thru an intersection between 6-7am & 572,621 from 8-9am. Your brain possibly rounds it to ~150K & ~575K respectively. Unless you are charging a toll for each car, the thing most interesting is “how does that compare to other time periods”

2. How does it cause data corruption?

Mostly when you use the SELECT .. WITH NOLOCK as the basis of some other Update or Insert. Some rows will miss out on whatever changes you thought should be applied to them.

I hope this assists you to improve you design Or at least avoid creating issues that will be impossible to debug in production.

All feedback & comments welcome.

Dave.

Thought for the day, (which sums up my feelings about Cursors & Hints): There is no Right or Wrong, only outcome.

A very good post Dave. Having said this, I have a scenario as mentioned below:

Consider an order processing system, which has processing of an order as a transaction. The transaction is implemented at DAL layer. Since every transaction will have a status, we need to fetch the status id from status master table and associate it with the transaction records while processing the order. In such case, if we dont place nolock, then the status table gets locked until the transaction is committed and subsequently might lead to deadlock scenario when there is more load on the system.

I am curious to know if there is any alternative for this scenario with out using nolock hint.

Good, solid basics. But the flip side is that telling a good SQL Server programmer not to hint is a lot like telling a C++ or C# programmer not to debug, and certainly to never use directives.

The fact of the matter is that the optimizer makes little mistakes all the time (on purpose, in fact, so it doesn’t have to spend a huge amount of time optimizing) but some of its mistakes are HUGE. A good SQL developer, who understands the shape and distribution of their data, can frequently fix things.

One topic I’d like to see more good database blogs write about is plan stability. The single worst thing, other than something that can never be done efficiently in production, is something that occasionally goes inefficient in production. In these cases, when time is tight and the demand to fix something is immediate, a simple hint or two can often straighten out the optimizer and cause a plan to be optimal (or darned close) and absolutely stable. This is infinitely (and I do mean infinitely) preferable to a plan that is absolutely optimal 99% of the time it compiles but terrible 1% of the time. Plan stability should be a top-level priority of any very good SQL developer.

Sometimes I think our hint-phobia is rooted in a belief that we actually write code in the database. We do not; SQL issues specifications that the optimizer turns into code, and sometimes a hint can make that specification more explicit and complete.

I know this kind of thing makes the purists gag, but I’m a purist too. Did all the PhD work, relational algebra, relational calculus – the whole nine yards. And I did great at it. But I still don’t see why we would ever want to take a tool, like hints, away from programmers, instead of teaching them to use them well.

So I like your post, but I especially like the energy you put into trying to develop a way to use hints wisely. I’m not so sure we need so many posts, however, telling us to use them "almost never." A lot of high-profile companies would go out of business without them, because the optimizer is not yet good enough – and as we continue to add features to the kernel, the optimizer probably will never be good enough.

I've always been suspicious of nolock, thank you for confirming my suspicions. I used to back down on this issue because I was "just an application developer" and the DBA's and online experts were telling me to use nolock.

Could you comment on the effectiveness of using read committed snapshot as an easy "one line solution" to deadlock-select issues?

Nobody mentioned the secnarios where you have historical data in a table that won't be changing but you might have hundreds of thousands of rows. Wouldn't using nolock be a good use?We haver our primary and historical database and after a while,data is moved from one server to another or from one table to a historical table.

Nobody mentioned the secnarios where you have historical data in a table that won't be changing but you might have hundreds of thousands of rows. Wouldn't using nolock be a good use?We have our primary and historical database and after a while,data is moved from one server to another or from one table to a historical table.

Dave has omitted the most common legitimate use of NOLOCK — report solutions. There are two scenarios. The first is the historical copy of a production database not subject to updating where using NOLOCK can slightly reduce the overhead associated with locking. The more interesting scenario is the reporting solution that is forced (usually by client budget) to live on the same database as production. In that case, reports like "Total YTD Revenue by Month" could deadlock the production system without NOLOCK in place. Using NOLOCK in reports certainly does risk having the uncommitted read or dup-PK due to page split issues, but it's very unlikely to substantially affect the historical transactions that is typically the target of reporting solutions.

There are reasons to use NOLOCK. For me the main thing is for generating reports, that take a long time to run. They don't need to lock anything, because they are READ ONLY, and never insert, update, delete EVER. So reading dirty data is OK. The main reason for me to use this is because I have tables with millions of rows. To generate the report takes 2 mintues, maybe longer. It is a very big table, and indexes are proper. But I would NEVER want my report query to block actual users that are modifying the data – I would NEVER block them for 2 minutes. The reports don't need to be 100% accurate. And since about 99.9999% of my transactions commit (since my validation is very detailed) it's really not an issue.

I would suggest that rather than blocking the DB provides a better way to run reports that more importaintly don't block regular transactions. Until then we have NOLOCK.

At my previous job I used a 3rd party application which was 'seasoned' with lots of these nolock hints – I constantly complained about it, but they refused to change it, OR prove that it was needed. The application is prone to "incredibly hard to reproduce" errors, and I always wondered if all the nolocks were a source of some of that. Bad design is probably the source of most of it, but I'm sure that creating every select with a nolock hint is also a bad idea. I did send them this article, but I don't work there any more so I'm not sure I care, but it's nice to know I was right.

In our OLTP enviroment, we pretty much have to run with the nolock hint, as of the multiuser performance gains. It does mean that we constantly run the risk the results not being quite bang on, but with tens of thousands of searches done each day (holiday data) and a constantly changing base data set, nolock is crucial.

Sure a re-design would reduce the need for this, but that is not likely – "hang on everyone, just wait a year or 3 whilst we redo the whole system end to end"

the thruput of the db is huge, terrabytes per day of transactional data – it would fall right onto its asz with out the nolock hit

A fantastic article though, and I quite agree care should be used with all hints, set options etc, and will follow thise links for further reading!

Hmmm… then why is Microsoft RLS massively peppered with NOLOCK? Were the developers of RLS aware of this problem? Why hasn't Microsoft fixed RLS? It seems we are having huge issues because of this exact problem.

It depends. If report is calculating averages, or something like that on the dataset returned, or a count or some aggregate you can probably get away with NOLOCK, although you might be better off using READPAST.

It's everywhere here. EVERYWHERE. The use of NOLOCK is so prevalent it gives me the willies sometimes. It is difficult to explain that it's a consequence of using NOLOCK to devs who come to me with the complaint that I need to do something about sql server because the users are getting "row not found or changed" errors. The response is that I don't know what I'm talking about, they have to use nolock and they've always used it. (the dba position is a new one in the organization.)

And if they haven't hammered NOLOCK onto every table in their query, I find they've using read uncommitted for the isolation level for their LINQ connection.

I am no fan of nolock. However it’s primary use is not mentioned here. Also I think some of the reasons given for using it In the comments are flimsy so I thought I would add my insights

– If you are using a third-party system in which you have no control over the code or configuration of, controlled use of nolock is useful to help insure you are unsure of the impact your querying will have. Turning on snapshot mirroring (even if you can in this case) and not making it the default isolation is a high price to pay for a query or two. Turning it on and making it the default isolation would have the potential to alter the behavior of the application in cases where applications are relying on read committed waits to control serialization. This and it can be fine for informal ad hoc querying

– Dave your update citing a reason for nolock, as is pretty much every other one on this page, is not relevant if you are using snapshot mirroring. Be that as it may, I hear nolock apologists claiming that there are cases when “accuracy is not important”. First off, people can and should expect correct answers from computers to the questions the ask. Second, “inaccuracy” is not just a matter of a total being stale. “Nolock inaccuracy” can contain duplicates, phantom records, “mixed-bag” results and otherwise portray the data in states in which it never actually existed. This is true even if any data is never rolled back (which is a common misconception that nolock only has negative impact during the exceptional rollback scenario). To all “I nolock for reports” commenters see snapshot isolation and quit using wishful thinking for the accuracy of your reports.

– TSQL hints should be avoided but last resort may be strong. Hints can be somewhat necessary when as mentioned you do not have the ability to change the schema. Not all hints are equally as obnoxious either. Join hints can be a true savior with minimal risk`. Hints just thrown in there to alter the plan in a somewhat mysterious way (e.g. force order) are exactly the hints that can turn on you with changes in the data distribution or improvements to Sql Server.

– Starforce, nolock saves you nothing on data that “doesn’t change often”. The overhead involved in placing a shared lock where there are no intent waits has got to be close to zero. I have tested this many times and can’t ever detect a clearly measurable performance difference between committed and uncommitted reads in data that does not have updaters. I gave up on this concept long ago.

– The “use stored procedures often” in my opinion is more of a configuration philosophy than something that has a significant impact on performance. Why is it listed first under solutions? Your dogma is showing.

– Dave in solutions you have failed to mention the use of connection binding tokens. In today’s world of over-engineered DALs, a callout to a service mid-transaction can be a major problem. If for some reason this is a must, you have now potentially taken a logical transaction and spanned it across multiple database transactions as the service will have its own connection. Oops, now your operations look to the database like a competitor for locks and your code is potentially deadlocking itself. By passing a binding token sql server knows to allow the two connections to share locks even if the sessions are generated by a different client on a different machine. Sounds like this is what Phaneedra is experiencing.

– Phaneedra your database is not properly normalized and this is a major contributor to database contention and potentially deadlocking. With the schema as is, you should make sure that your “transaction” is indeed wrapped in a single transaction and split amongst two. You can’t deadlock yourself unless you have one seriously screwed up architecture.

– Tony Moe a read-committed reader searching for a shared lock is not “blocked forever if the table being read is constantly being updated”. In this case your query will go into an intent queue and it will wait in serial order with other lock requests. This is pretty much the way all data management systems that lock work.

– Yoder you have nicely tied two concepts together 1) I don’t understand why readers lock and therefore 2) I don’t think nolock is a problem.

– Cameron snapshot isolation *is* versioning

NOLOCK is fool's gold. Learn locking and you won't have much use for it

So nice to see someone out the WITH (NOLOCK) foolishness. I see this being done all over the place for supposed performance gains. All they're doing is opening up a channel for dirty reads. It's worst in a staging source pull, and there is likely no contention happening on a dim load, so it's especially annoying to see people continuing to use this in data warehousing.

i suppose the question is if ur running a select and something is in the process of updating the same table do u have to have the updated records returned in ur select at that instance in time. what does it matter? if u run the select after the update the u get the records that just changed. to me its a timing thing.