Is NOLOCK always bad?

https://dba.stackexchange.com/questions/10655

16-10-2019
|

Pergunta

I am a Report Developer who wants to make my queries as efficient as possible. I used to work with a DBA who told me - I believe because I was always dealing with reports on a Production Server - to use NOLOCK in every single query.

Now, I work with a DBA who has banned NOLOCK under any circumstance - even when a report of mine (due to a considerable lack of indexes on a couple of tables) is stopping replication and system updates. In my opinion, in this case, a NOLOCK would be a good thing.

Since most of my SQL training has come various DBAs with very different opinions, I wanted to ask this to a wide variety of DBAs.

Solução

If your report blocks updates that your DBA is right: you should absolutely not use NOLOCK. The very fact that there are conflicts is a clear indication that if you would use dirty reads you would get incorrect reports.

In my opinion, there are always better alternatives than NOLOCK:

Are your production tables read only in effect and never get modified? Mark the database read only!
Table scans cause lock conflicts? Index the tables appropriately, the benefits are multiple.
Can't modify/don't know how to index appropriately? Use SNAPSHOT ISOLATION.
Can't change app to use snapshot? Turn on read committed snapshot!
You have measured the impact of row versioning and have evidence it impacts performance? You can't index the data? and you are OK with incorrect reports? Then at the very least do yourself a favor and use SET TRANSACTION ISOLATION LEVEL, not a query hint. It will be easier to later fix the isolation level instead of modifying every query.

Outras dicas

It isn't always bad.

Of course it allows you to read uncommitted values (that may be rolled back and hence never logically existed) as well as allowing phenomena such as reading values multiple times or not at all.

The only isolation levels that guarantee that you won't encounter any such anomalies are serializable/snapshot. Under repeatable read values can be missed if a row is moved (due to a key update) before the scan reaches this row, under read committed values can be read twice if a key update causes a previously read row to move forward.

These issues are more likely to arise under nolock however because, by default, at this isolation level it will use an allocation ordered scan when it estimates there is more than 64 pages to be read. As well as the category of issues that arise when rows move between pages due to index key updates these allocation ordered scans are also vulnerable to issues with page splits (where rows can be missed if the newly allocated page is earlier in the file than the point already scanned or read twice if an already scanned page is split to a later page in the file).

At least for simple (single table) queries it is possible to discourage the use of these scans and get a key ordered scan at nolock by simply adding an ORDER BY index_key to the query so that the Ordered property of the IndexScan is true.

But if your reporting application doesn't need absolutely precise figures and can tolerate the greater probability of such inconsistencies it might be acceptable.

But certainly you should not be chucking it on all queries in the hope that is a magic "turbo" button. As well as the greater probability of encountering anomalous results at that isolation level or no results at all ("Could not continue scan with NOLOCK due to data movement" error) there are even cases where the performance with nolock can be much worse.

Do your customers tolerate inconsistent results in reports? If the answer is no, you should not use NOLOCK - you can get wrong results under concurrency. I wrote a few examples here, here, and here. These examples show inconsistent output under READ COMMITTED and REPEATABLE READ, but you can tweak them and get wrong results with NOLOCK as well.

Most of the reports I create aren't run on current data. Most customer's are running reports are yesterday's data. Would your answer change if that was the case?

If that's the case, then you have one more possible option:
Instead of running your queries on the production database and messing around with locks and NOLOCK, you could run your reports from a copy of the production database.

You can set it up so it's automatically restored from a backup each night.
Apparently your reports are running on servers on customer's sites, so I don't know if setting this up would be a viable solution for you.
(but then again...they should have backups anyway, so all you need is some server space to restore them)

I'm an in-house developer, so this is easier for me because I have full control over the servers and databases.

You can do this at least for the reports that only need data from yesterday and older. Maybe some reports will have to stay on the production database, but at least you move some of the load to another database (or even better, another server).

I have the same situation at work as well:
We are using a production database copy like this for nearly all reporting stuff, but there are a few queries that require today's data.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange