Why am I getting nondeterministic results with the deterministic function STDEV()?
-
23-02-2021 - |
Question
This is type of query I'm trying to run:
WITH CTE_Ordered AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY PartitionField ORDER BY DateField) AS PartitionRowId
FROM SourceTable
),
CTE_Top1_PerPartition AS
(
SELECT *
FROM CTE_Ordered
WHERE PartitionRowId = 1
),
CTE_Calculations AS
(
SELECT AVG(NumberField1) AS NumberField1_Avg, StdDev.StdDev AS NumberField1_StdDev
FROM CTE_Top1_PerPartition
CROSS JOIN
(
SELECT STDEV(NumberField1) AS StdDev
FROM CTE_Top1_PerPartition
) AS StdDev
GROUP BY StdDev.StdDev
)
-- Final Select
SELECT *
FROM CTE_Calculations
Every time I run the final select, my NumberField1_StdDev value changes even though SourceTable is isolated and doesn't change.
I noticed if I select CTE_Top1_PerPartition into a temp table first and then run the rest of the query off of that temp table, then I get the same result for NumberField1_StdDev every time.
I would guess it has something to do with the order the results are returned in the CTE_Top1_PerPartition CTE, but I'm not sure why.
Solution
row_number
is not deterministic if there can be ties (i.e. rows with the same PartitionField
and DateField
values). Any of the tied values might end up with a PartitionRowId
of 1 which would presumably change the final result.
You could use rank
instead of row_number
but that would cause you to consider all the tied rows which may not be what you want. You could also update your analytic function to add additional order by
criteria to ensure that row_number
returns a deterministic first row.