Question

HI All

I was hoping someone could help me improve a query I have to run periodically. At the moment it takes more than 40 minutes to execute. It uses the full allocated memory during this time, but CPU usage mostly meanders at 2% - 5%, every now and then jumping to 40% for a few seconds.

I have this table (simplified example):

    CREATE TABLE [dbo].[dataTable]
    (
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [dteEffectiveDate] [date] NULL,
    [dtePrevious] [date] NULL,
    [dteNext] [date] NULL,
    [Age] [int] NULL,
    [Count] [int] NULL
    ) ON [PRIMARY]

    GO

Here are some input values:

INSERT INTO [YourDB].[dbo].[dataTable]
           ([dteEffectiveDate]
           ,[dtePrevious]
           ,[dteNext]
           ,[Age]
           ,[Count])
     VALUES
('2009-01-01',NULL,'2010-01-01',40,300),
('2010-01-01','2009-01-01', NULL,40,200),
('2009-01-01',NULL, '2010-01-01',20,100),
('2010-01-01','2009-01-01', NULL,20,50),
('2009-01-01',NULL,'2010-01-01',30,10)
GO

Each entry has a dteEffectiveDate field. In addition, each has a dtePrevious and dteNext, which reflects the dates of the nearest previous/next effective date. Now what I want is a query that will calculate the mid-value on the Count fields between successive periods, within a specific age.

So for example, in the data above, for age 40 we have 300 at 2009/01/01 and 200 at 2010/01/01 so the query should produce 250.

Note that age 30 has only one entry, 10. This is at 2009/01/01. There is no entry at 2010/01/01, but we know that data was captured at this point, so the fact that there is nothing means that 30 is 0 at this date. Hence the query should produce 5.

In order to achieve this I use a FULL JOIN of the table on itself, and use ISNULL to select values. Here is my code:

SELECT

    ISNULL(T1.dteEffectiveDate,T2.dtePrevious) as [Start Date]
    ,ISNULL(T1.dteNext,T2.dteEffectiveDate)  as [End Date]
    ,ISNULL(T1.Age,T2.Age) as Age 
    ,ISNULL(T1.[Count],0) as [Count Start]
    ,ISNULL(T2.[Count],0)   as [Count End]
    ,(ISNULL(T1.[Count],0)+ISNULL(T2.[Count],0))/2 as [Mid Count]

    FROM
    [ExpDBClient].[dbo].[dataTable] as T1
    FULL JOIN [ExpDBClient].[dbo].[dataTable] as T2

    ON 
    T2.dteEffectiveDate = T1.dteNext
    AND T2.Age = T1.Age

    WHERE ISNULL(T1.dteEffectiveDate,T2.dtePrevious) is not null
    AND ISNULL(T1.dteNext,T2.dteEffectiveDate) is not null

GO

which outputs:

Start Date  End Date    Age Count Start Count End   Mid Lives
2009-01-01  2010-01-01  40  300         200         250
2009-01-01  2010-01-01  20  100         50          75
2009-01-01  2010-01-01  30  10          0           5

It works perfectly, but when I run this on the actual data, which is about 7m records, it takes painfully long to execute.

Does anyone have any suggestions?

Thanks
Karl

Was it helpful?

Solution

It's hard to make a lot of recommendations.

One thing I'd would definitely recommend is indices on those columns that you use as foreign keys in your JOIN conditions, e.g.

  • Age
  • dteEffectiveDate
  • dteNext

Create a NONCLUSTERED index on each of those columns separately and measure again. With just a few data rows, there's no improvement measurable - but with millions of rows, it might make a difference.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top