Question

I have a table mytable with some columns including the column datekey (which is a date and has an index), a column contents which is a varbinary(max), and a column stringhash which is a varchar(100). The stringhash and the datekey together form the primary key of the table. Everything is running on my local machine.

Running

SELECT TOP 1 * FROM mytable where datekey='2012-12-05'

returns 0 rows and takes 0 seconds. But if I add a datalength condition:

SELECT TOP 1 * FROM mytable where datekey='2012-12-05' and datalength(contents)=0

it runs for a very long time and does not return anything before I give up waiting.

My question: Why? How do I find out why this takes such a long time?


Here is what I checked so far:

When I click "Display estimated execution plan" it also takes a very long time and does not return anything before I give up waiting.

If I do

SELECT TOP 1000 datalength(contents) FROM mytable order by datalength(contents) desc

it takes 7 seconds and returns a list 4228081, 4218689 etc.

exec sp_spaceused 'mytable'

returns

rows        reserved     data         index_size  unused
564019      50755752 KB  50705672 KB  42928 KB    7152 KB

So the table is quite large at 50 GB. Running

SELECT TOP 1000 * FROM mytable

takes 26 seconds.

The sqlservr.exe process is around 6 GB which is the limit I have set for the database.

Était-ce utile?

La solution

It takes a long time because your query needs DATALENGTH to be evaluated for every row and then the results sorted before it can return the 1st record. If the DATALENGTH of the field (or whether it contains any value) is something you're likely to query repeatedly, I would suggest an additional indexed field (perhaps a persisted computed field) holding the result, and searching on that.

Autres conseils

This old msdn blog post seems to agree with @MartW answer that datalength is evaluated for every row. But it's good to understand what is really meant by "evaluated" and what is the real root of the performance degradation.

As mentioned in the question, the size of every value in the column contents may be large. It means that every value bigger than ~8Kb is stored in special LOB-storage. So, taking into account the size of the other columns, it's clear that most of the space occupied by the table is taken by this LOB-storage, i.e. it's around 50Gb.

Even if the length of contents column for every row has been already evaluated, which is proved in post linked above, it's still stored in LOB. So engine still needs to read some parts of the LOB-storage to execute the query.

If LOB-storage isn't in RAM at the time of a query execution then we need to read it from a disk, which is of course much slower than from RAM. Also possibly the read of LOB-parts is rather randomized than linear which is even more slow as it tends to raise the whole number of memory-blocks needed to be read from a disk.

At the moment it probably won't be using the primary key because of the stringhash column included before the datekey column. Try adding an additional index that just contains the datekey column. Once that key is created if it's still slow you could also try a query hint such as:

SELECT TOP 1 * FROM mytable where datekey='2012-12-05' and datalength(contents)=0 WITH INDEX = IX_datekey

You could also create a seperate length column that's updated either in your application or in an insert / update trigger.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top