SELECT * WHERE VarCharColumn IN (…) Optimization

https://dba.stackexchange.com/questions/1338

16-10-2019
|

Question

I have a list of 3000 strings, and I am passing them (twenty at a time) into a parametrized IN clause. It definitely isn't getting the results I would like to see ~ 500ms per execution.

The column is an index. Do you know a better way than this:

SELECT * FROM [ohb].[dbo].[MasterUrls] WITH (NOLOCK) WHERE Hash 
IN(@p0,@p1,@p2,@p3,@p4,@p5,@p6,@p7,@p8,@p9,@p10,@p11,@p12,@p13,@p14,@p15,@p16,@p17,@p18,@p19)

A list of 3000 takes between 3 and 5minutes. I really need that down to around 30 seconds. Is this possible?

I am using MSSQL 2008 R2 on a server with 24 gigs of RAM, and dual 8core NUMA Xeons @2.4Ghz running on a 6HDD (@15k/rpm) RAID 10 ISCSI.

The table has 1.4M rows, and the index is a non-clustered index.

Execution plan shows the index scan as 90% of the total execution.

Solution 4

I actually solved this in a MUCH faster SQL-less fashion.

In another step before this one, I grab the URLs and IDs from the table (instead of just the URL, which i would hash for the lookup -- this question), save them in memory, then on the FS (in case memory failed -- async, of course), then when it came time to do the lookup I read from my data I stored in memory/FS.

The process now takes less than 5 seconds on average to do a lookup and update (the step after this question) the data for 3000 rows. Much better than 240 seconds on avg.

OTHER TIPS

SELECT * will invalidate any optimal use of an index (it isn't covering) even if hash is indexed. Your index scan is most likely on the clustered index because of this.

Personally, I'd start with

putting the 3000 search values into a table with an index
Edit: as per Marian's comment, this can be passed in a list or table already
use this in any of JOIN, IN, EXISTS (same plan usually)
ensure my index on MasterUrls suits using Hash and covers col1, col2, col3

Something like

CREATE TABLE #foo (Hash ...)
INSERT #foo...
CREATE INDEX IX_FOO ON #foo (hash)

--either
CREATE NONCLUSTERED INDEX IX_Hash ON MasterUrls (hash) INCLUDE (col1, col2, col3)
--or    
CREATE CLUSTERED INDEX IXC_Hash ON MasterUrls (hash)

SELECT col1, col2, col3
FROM [ohb].[dbo].[MasterUrls] M
JOIN
#foo F ON M.Hash = F.Hash

Pass the values in via a table valued paramater. This way they are already in table form. Then copy the values from the TVP into a temp table, which has a clustered index on it. Use this temp table as a JOIN member of your query.

Remove the SELECT * and change it to only the columns that you need, with the additional columns included. If SELECT * is needed then include all additional columns as a included columns within the index.

Yes:- This is possible just you have to store that all passing id into one string with comma separated value and make one function :

Just Follow the step :

First Make one function:

 ALTER FUNCTION [dbo].[UDF_IDListToTable]
 (
    @list          [varchar](MAX),
    @Seperator     CHAR(1)
  )
 RETURNS @tbl TABLE (ID INT)
 WITH 

 EXECUTE AS CALLER
 AS
  BEGIN
    DECLARE @position INT
    DECLARE @NewLine CHAR(2) 
    DECLARE @no INT
    SET @NewLine = CHAR(13) + CHAR(10)

    IF CHARINDEX(@Seperator, @list) = 0
    BEGIN
    INSERT INTO @tbl
    VALUES
      (
        @list
      )
END
ELSE
BEGIN
    SET @position = 1
    SET @list = @list + @Seperator
    WHILE CHARINDEX(@Seperator, @list, @position) <> 0
    BEGIN
        SELECT @no = SUBSTRING(
                   @list,
                   @position,
                   CHARINDEX(@Seperator, @list, @position) - @position
               )

        IF @no <> ''
            INSERT INTO @tbl
            VALUES
              (
                @no
              )

        SET @position = CHARINDEX(@Seperator, @list, @position) + 1
    END
END
RETURN
END

after that you just use with inner join :-

SELECT  *
FROM    [ohb].[dbo].[MasterUrls] AS mul WITH ( NOLOCK )
        INNER JOIN dbo.UDF_IDListToTable(@IDString, ',') udtl ON mul.hash = udtl.ID;

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange