Question

I have two tables to JOIN based on BETWEEN condition.

Table 1 is small is small around 1500 records and Table 2 is of 40 millions records. Table1 is having only one column with the datatype bigint and Table2 with 8 columns. I need to do join between these two tables on BETWEEN condition.

I tried following but its getting slow for just 1 record in Table1 and 40 millions in Table2.

Query:

SELECT t1.cola AS [InputValue],t2.cola,t2.colb,t2.colc,t2.cold,t2.code
FROM table2 t2 
INNER JOIN table1 t1 ON t1.cola BETWEEN t2.cola AND t2.colb ;

Indexing:

  1. CREATE NONCLUSTERED INDEX NCIX_Table1_Cola ON table1(cola)
  2. CREATE NONCLUSTERED INDEX NCIX_Table2_Col_a_b ON table2(cola,colb)

Above query took 30 seconds for just 1 record in table1 and 40 millions in table2. As said I will be getting more than 1500 records in table1 will getting more slower. Any alternative of between or proper indexing need to be done?

Edit: Added sample data.

Table1:

cola
---------------
12
145
34
90
88990
987611
55
...
..
......1500 rows

Table2:

cola    colb    colc    cold    cole
-------------------------------------
0       10      c1      d1      e1
11      20      c2      d2      e2
21      40      c3      d3      e3
41      60      c4      d4      e4
61      100     c5      d5      e5
101     1000    c6      d6      e6
1001    10000   c7      d7      e7
10001   200000  c8      d8      e8
...... 
......40 millions records

Expected result:

InputValue  cola    colb    colc    cold    cole
--------------------------------------------------
12          11      20      c2      d2      e2
145         101     1000    c6      d6      e6
34          21      40      c3      d3      e3
.....

SQL Fiddle : Link

Was it helpful?

Solution

I've faced similar issue - the thing is sql doesn't "know" that cola and colb is range and cola of next row will always be bigger than current row colb, so between won't help too much: when it will find first cola that matches between, it will continue checking others also. I would suggest query that uses cross apply to find max(table2.cola) <= table1.cola and then to verify add where table2.colb >= table1.cola Something like this

SELECT      t1.cola AS [InputValue]
           ,t2.cola
           ,t2.colb
           ,t2.colc
           ,t2.cold
           ,t2.code
FROM        table1 t1
CROSS APPLY (   SELECT   TOP 1
                         t2.cola
                        ,t2.colb
                        ,t2.colc
                        ,t2.cold
                        ,t2.code
                FROM     table2 t2
                WHERE    t1.cola >= t2.cola
                         AND t1.cola <= t2.colb
                ORDER BY t2.cola DESC ) t2;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top