It takes a long time because your query needs DATALENGTH to be evaluated for every row and then the results sorted before it can return the 1st record. If the DATALENGTH of the field (or whether it contains any value) is something you're likely to query repeatedly, I would suggest an additional indexed field (perhaps a persisted computed field) holding the result, and searching on that.
Adding datalength condition makes query slow
-
05-12-2021 - |
Question
I have a table mytable with some columns including the column datekey
(which is a date
and has an index), a column contents
which is a varbinary(max)
, and a column stringhash
which is a varchar(100)
. The stringhash
and the datekey
together form the primary key of the table. Everything is running on my local machine.
Running
SELECT TOP 1 * FROM mytable where datekey='2012-12-05'
returns 0 rows and takes 0 seconds.
But if I add a datalength
condition:
SELECT TOP 1 * FROM mytable where datekey='2012-12-05' and datalength(contents)=0
it runs for a very long time and does not return anything before I give up waiting.
My question: Why? How do I find out why this takes such a long time?
Here is what I checked so far:
When I click "Display estimated execution plan" it also takes a very long time and does not return anything before I give up waiting.
If I do
SELECT TOP 1000 datalength(contents) FROM mytable order by datalength(contents) desc
it takes 7 seconds and returns a list 4228081, 4218689 etc.
exec sp_spaceused 'mytable'
returns
rows reserved data index_size unused
564019 50755752 KB 50705672 KB 42928 KB 7152 KB
So the table is quite large at 50 GB. Running
SELECT TOP 1000 * FROM mytable
takes 26 seconds.
The sqlservr.exe process is around 6 GB which is the limit I have set for the database.
La solution
Autres conseils
This old msdn blog post seems to agree with @MartW answer that datalength
is evaluated for every row. But it's good to understand what is really meant by "evaluated" and what is the real root of the performance degradation.
As mentioned in the question, the size of every value in the column contents
may be large. It means that every value bigger than ~8Kb is stored in special LOB-storage. So, taking into account the size of the other columns, it's clear that most of the space occupied by the table is taken by this LOB-storage, i.e. it's around 50Gb.
Even if the length of contents
column for every row has been already evaluated, which is proved in post linked above, it's still stored in LOB. So engine still needs to read some parts of the LOB-storage to execute the query.
If LOB-storage isn't in RAM at the time of a query execution then we need to read it from a disk, which is of course much slower than from RAM. Also possibly the read of LOB-parts is rather randomized than linear which is even more slow as it tends to raise the whole number of memory-blocks needed to be read from a disk.
At the moment it probably won't be using the primary key because of the stringhash column included before the datekey column. Try adding an additional index that just contains the datekey column. Once that key is created if it's still slow you could also try a query hint such as:
SELECT TOP 1 * FROM mytable where datekey='2012-12-05' and datalength(contents)=0 WITH INDEX = IX_datekey
You could also create a seperate length column that's updated either in your application or in an insert / update trigger.