Question

How do I improve performance of the following query:

update t 
set t.recent_5_min = (select MIN(value) 
                      from t t2 
                      where t2.date between t.date - 5 and t.date - 1)

t has:

  • recent_5_min - money null - of course it's nullable, as it only gets poluplated by a job.
  • value - money non-null
  • date - int, PK with clustered index on it. This is the only index on the table.

t has 900K records, stats are up to date, the query takes forever to run.

Update 1 - Sample data generated by the query I posted initially.

Before:

date        value                 recent_5_min
----------- --------------------- ---------------------
1           10.00                 NULL
2           19.00                 NULL
3           2.00                  NULL
4           9.00                  NULL
5           11.00                 NULL

After:

date        value                 recent_5_min
----------- --------------------- ---------------------
1           10.00                 NULL
2           19.00                 10.00
3           2.00                  10.00
4           9.00                  2.00
5           11.00                 2.00
Was it helpful?

Solution

Give this a try

    update t 
    set t.recent_5_min = tmin.minvalue 
    from t 
    join (
            select t1.date, min(t2.value) as minvalue
            from t t1 
            join t t2 
              on t2.date between t1.date - 5 and t1.date - 1 
            group by t1.date
         ) tmin 
      on t.date = tmin.date
   where t.recent_5_min is null or t.recent_5_min <> tmin.minvalue

If date is a PK this might work
NOT tested and a good chance it won't work

update t1
set t1.recent_5_min = min(t2.value) 
from t t1 
join t t2 
  on t2.date between t1.date - 5 and t1.date - 1 
where t1.recent_5_min is null or t1.recent_5_min <> min(t2.value)
group by t1.date

OTHER TIPS

It seems that subquery is executed for each row. At the same time query not seem as heavy for 900K records.


Added:

After some experiments I found following. Interesting that query plans for

update top (100) t
set t.recent_5_min = (select MIN(value) 
                      from t t2 
                      where t2.date between t.date - 5 and t.date - 1)
from t t

and

update top (500) t
set t.recent_5_min = (select MIN(value) 
                      from t t2 
                      where t2.date between t.date - 5 and t.date - 1)
from t t

are noticeably different. In the second case (and seems in original query also) Sort operator appears in the query plan performing sort over the value taking enormous resources.

I tried following manual pivot/unpivot/aggregate technique, that transform query causing Constant Scan operator to be used instead of Sort, which is way better in this case:

;with cte as (
    select t.date, t.recent_5_min, m.minVal
    from t
        left join t t1 on t1.date = t.date - 1
        left join t t2 on t2.date = t.date - 2
        left join t t3 on t3.date = t.date - 3
        left join t t4 on t4.date = t.date - 4
        left join t t5 on t5.date = t.date - 5
        cross apply (select min(val) from (values (t1.value), (t2.value), (t3.value), (t4.value), (t5.value)) f(val)) m(minVal)
)
update cte set recent_5_min = minVal

For me it passed just for a few seconds for generated 900K rows.

The following work also, but takes longer time and more reads:

declare @t int
select @t = 100
update top (@t) percent t 
set t.recent_5_min = (select MIN(value) 
                      from t t2 
                      where t2.date between t.date - 5 and t.date - 1)
from t t

For t2.date between t.date - 240 and t.date - 1 it took about a minute for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top