Question

The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?

Query [01:10:00]

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
    CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        ELSE sys.fn_cdc_increment_lsn(0x00) END
    , sys.fn_cdc_get_max_lsn()
    , 'all with mask') 
WHERE __$operation <> 1

Modified Query [00:01:10]

DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')     
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        ELSE sys.fn_cdc_increment_lsn(0x00) END

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
        @MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1

[Modified]

I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.

CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
    SELECT @a Parameter, Getdate() Dt, PartitionTest.*
    FROM PartitionTest
);

SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))

But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.

Was it helpful?

Solution

Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END or something like that.

I think your RAND problem is because you continue to reseed:

Repetitive calls of RAND() with the same seed value return the same results.

For one connection, if RAND() is called with a specified seed value, all subsequent calls of RAND() produce results based on the seeded RAND() call. For example, the following query will always return the same sequence of numbers.

I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.

OTHER TIPS

In your first query, your fn_cdc_increment_lsn and fn_cdc_get_min_lsn get executed for every row. In second example, just once.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top