T-SQL Daylight Saving lookup table - poorly performing table-valued function
Question
I've created a "Daylight Savings" lookup calendar table for the GMT region. The function I'm using to query the table to return the local datetime from a UTC datetime is performing poorly.
Any help to improve this, including changing the way the TVF is coded, would be appreciated.
The function will be used in queries that can return 1m+ rows on a frequent basis. The function is used when querying warehouse tables containing trip data.
The start and end datetimes of the trips are stored in UTC and the function above is used to convert them into local time. A developer, long since gone from the company, wrote a scalar function that converts UTC time to local time. I was tasked to rewrite that function using a calendar table and a TVF as TVF are supposed to better performing than scalar functions
Without the function:
SQL Server Execution Times: CPU time = 4633 ms, elapsed time = 4909 ms.
execution plan without function
With the function:
SQL Server Execution Times: CPU time = 20795 ms, elapsed time = 21176 ms.
Here's a sample output from the table
CREATE TABLE dbo.DSTLookup
(
[Id] int,
[Tzid] int,
[DT_WhenSwitch] datetime,
[DSTOffSetSeconds] int,
[GMTOffSetSeconds] int
)
INSERT INTO dbo.DSTLookup
VALUES (29, 2, N'2014-03-30T01:00:00', 3600, 0),
(30, 2, N'2014-10-26T02:00:00', 0, 0),
(31, 2, N'2015-03-29T01:00:00', 3600, 0),
(32, 2, N'2015-10-25T02:00:00', 0, 0),
(33, 2, N'2016-03-27T01:00:00', 3600, 0),
(34, 2, N'2016-10-30T02:00:00', 0, 0),
(35, 2, N'2017-03-26T01:00:00', 3600, 0),
(36, 2, N'2017-10-29T02:00:00', 0, 0),
(37, 2, N'2018-03-25T01:00:00', 3600, 0),
(38, 2, N'2018-10-28T02:00:00', 0, 0)
This is the TVF:
CREATE FUNCTION dbo.FN_GetLocalTime_FromUTC_BasedOnTZId
(@StartDateTime DATETIME, @EndDateTime DATETIME, @Tzid INT)
/*=========================================================================
* 2017-03-27
* Returns local time from UTC time based on timeZoneId
*
==========================================================================*/
RETURNS TABLE
AS
RETURN
(
WITH cteStartDate AS
(
SELECT
RN = ROW_NUMBER() OVER (ORDER BY D.Id DESC),
D.DSTOffSetSeconds 's_DST_OffSet',
D.GMTOffSetSeconds 's_GMT_OffSet'
FROM
dbo.DSTLookup D
WHERE
D.DT_WhenSwitch <= @StartDateTime
AND D.Tzid = @Tzid
),
cteEndDate AS
(
SELECT
RN = ROW_NUMBER() OVER (ORDER BY D.Id DESC),
D.DSTOffSetSeconds 'e_DST_OffSet',
D.GMTOffSetSeconds 'e_GMT_OffSet'
FROM
dbo.DSTLookup D
WHERE
D.DT_WhenSwitch <= @EndDateTime
AND D.Tzid = @Tzid
),
cteConvertStartDate AS
(
SELECT
DATEADD(SECOND, (COALESCE(S.s_DST_OffSet, 0) + COALESCE(S.s_GMT_OffSet, 0)), @StartDateTime) 'LocalStartDateTime'
FROM
cteStartDate S
WHERE
S.RN = 1
),
cteConvertEndDate AS
(
SELECT
DATEADD(SECOND, (COALESCE(E.e_DST_OffSet, 0) + COALESCE(E.e_GMT_OffSet, 0)), @EndDateTime) 'LocalEndDateTime'
FROM
cteEndDate E
WHERE
E.RN = 1
)
SELECT
S.LocalStartDateTime, E.LocalEndDateTime
FROM
cteConvertStartDate S, cteConvertEndDate E
);
GO
To query the TVF:
SELECT *
FROM dbo.FN_GetLocalTime_FromUTC_BasedOnTzId
('2017-03-27 10:00:30', '2017-03-27 10:15:54', 2);
Execution plan following Max's recommendations to include the primary key.
Solution
Make your function a schema-bound table-valued-function by adding WITH SCHEMABINDING
to the RETURNS TABLE
clause.
So:
CREATE FUNCTION dbo.FN_GetLocalTime_FromUTC_BasedOnTZId
(@StartDateTime DATETIME, @EndDateTime DATETIME, @Tzid INT)
/*=========================================================================
* 2017-03-27
* Returns local time from UTC time based on timeZoneId
*
==========================================================================*/
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH cteStartDate AS
(
SELECT
RN = ROW_NUMBER() OVER (ORDER BY D.Id DESC),
D.DSTOffSetSeconds 's_DST_OffSet',
D.GMTOffSetSeconds 's_GMT_OffSet'
FROM
dbo.DSTLookup D
WHERE
D.DT_WhenSwitch <= @StartDateTime
AND D.Tzid = @Tzid
),
cteEndDate AS
(
SELECT
RN = ROW_NUMBER() OVER (ORDER BY D.Id DESC),
D.DSTOffSetSeconds 'e_DST_OffSet',
D.GMTOffSetSeconds 'e_GMT_OffSet'
FROM
dbo.DSTLookup D
WHERE
D.DT_WhenSwitch <= @EndDateTime
AND D.Tzid = @Tzid
),
cteConvertStartDate AS
(
SELECT
DATEADD(SECOND, (COALESCE(S.s_DST_OffSet, 0) + COALESCE(S.s_GMT_OffSet, 0)), @StartDateTime) 'LocalStartDateTime'
, S.RN
FROM
cteStartDate S
WHERE
S.RN = 1
),
cteConvertEndDate AS
(
SELECT
DATEADD(SECOND, (COALESCE(E.e_DST_OffSet, 0) + COALESCE(E.e_GMT_OffSet, 0)), @EndDateTime) 'LocalEndDateTime'
, E.RN
FROM
cteEndDate E
WHERE
E.RN = 1
)
SELECT
S.LocalStartDateTime, E.LocalEndDateTime
FROM
cteConvertStartDate S
INNER JOIN cteConvertEndDate E ON S.RN = E.RN
);
This allows the query processor to "in-line" the function. This allows several optimizations, not least of which is the ability to properly understand statistics for objects referenced in the function.
Add a clustered index to the dbo.DSTLookup
table. This allows the query to perform a lookup instead of a scan. For the number of rows in your sample data, this won't likely make a large difference, but for you real table, it may make a very big difference.
Since you have an Id
column that seems to be a monotonically increasing integer, perhaps that is a good candidate key to be used as a clustered primary key:
CREATE TABLE dbo.DSTLookup
(
[Id] int
CONSTRAINT PK_DSTLookup
PRIMARY KEY CLUSTERED,
[Tzid] int,
[DT_WhenSwitch] datetime,
[DSTOffSetSeconds] int,
[GMTOffSetSeconds] int
);
I'd consider adding the following index based on your TVF:
CREATE INDEX IX_DSTLookup_001
ON dbo.DSTLookup (DT_WhenSwitch, Tzid)
INCLUDE (DSTOffSetSeconds, GMTOffSetSeconds);
OTHER TIPS
If Tzid
and DT_WhenSwitch
define a unique row I recommend clustering the dbo.DSTLookup
table by those two columns. You can make those columns the primary key if you want or you can just make them be the clustered index.
CREATE TABLE dbo.DSTLookup
(
[Id] int,
[Tzid] int,
[DT_WhenSwitch] datetime,
[DSTOffSetSeconds] int,
[GMTOffSetSeconds] int
);
CREATE CLUSTERED INDEX CI_DSTLookup ON dbo.DSTLookup ([Tzid], [DT_WhenSwitch]); -- new
INSERT INTO dbo.DSTLookup
VALUES (29, 2, N'2014-03-30T01:00:00', 3600, 0),
(30, 2, N'2014-10-26T02:00:00', 0, 0),
(31, 2, N'2015-03-29T01:00:00', 3600, 0),
(32, 2, N'2015-10-25T02:00:00', 0, 0),
(33, 2, N'2016-03-27T01:00:00', 3600, 0),
(34, 2, N'2016-10-30T02:00:00', 0, 0),
(35, 2, N'2017-03-26T01:00:00', 3600, 0),
(36, 2, N'2017-10-29T02:00:00', 0, 0),
(37, 2, N'2018-03-25T01:00:00', 3600, 0),
(38, 2, N'2018-10-28T02:00:00', 0, 0);
The reason for doing that is it will allow very fast individual row lookups. For both queries against the table you want to filter on [Tzid]
and to find the first [DT_WhenSwitch]
value in descending order. With the right clustered index getting that row can be a single clustered index seek.
To get that the plan I want I'm going to simplify the TVF a bit with the APPLY
and TOP
operators. I also want to make it very obvious to the optimizer that I get back just one row every time. Here's one implementation:
CREATE FUNCTION dbo.FN_GetLocalTime_FromUTC_BasedOnTZId
(@StartDateTime DATETIME, @EndDateTime DATETIME, @Tzid INT)
/*=========================================================================
* 2017-03-27
* Returns local time from UTC time based on timeZoneId
*
==========================================================================*/
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT
DATEADD(SECOND, (COALESCE(S.s_DST_OffSet, 0) + COALESCE(S.s_GMT_OffSet, 0)), @StartDateTime) 'LocalStartDateTime'
, DATEADD(SECOND, (COALESCE(E.e_DST_OffSet, 0) + COALESCE(E.e_GMT_OffSet, 0)), @EndDateTime) 'LocalEndDateTime'
FROM (SELECT 1 t) t
OUTER APPLY (
SELECT TOP 1
D.DSTOffSetSeconds 's_DST_OffSet',
D.GMTOffSetSeconds 's_GMT_OffSet'
FROM dbo.DSTLookup D
WHERE D.DT_WhenSwitch <= @StartDateTime AND D.Tzid = @Tzid
ORDER BY D.DT_WhenSwitch DESC
) s
OUTER APPLY (
SELECT TOP 1
D.DSTOffSetSeconds 'e_DST_OffSet',
D.GMTOffSetSeconds 'e_GMT_OffSet'
FROM dbo.DSTLookup D
WHERE D.DT_WhenSwitch <= @EndDateTime AND D.Tzid = @Tzid
ORDER BY D.DT_WhenSwitch DESC
) e
);
Here is the query plan for your example query in the question:
As expected we only do two seeks against the clustered index:
(1 row(s) affected)
Table 'DSTLookup'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
I wasn't able to test against SQL Server 2008, but I think that syntax will work on that platform. db fiddle for SQL Server 2014.