Question

This is type of query I'm trying to run:

WITH CTE_Ordered AS
(
  SELECT *, ROW_NUMBER() OVER (PARTITION BY PartitionField ORDER BY DateField) AS PartitionRowId
  FROM SourceTable
),
CTE_Top1_PerPartition AS
(
  SELECT *
  FROM CTE_Ordered
  WHERE PartitionRowId = 1
),
CTE_Calculations AS
(
  SELECT AVG(NumberField1) AS NumberField1_Avg, StdDev.StdDev AS NumberField1_StdDev
  FROM CTE_Top1_PerPartition 
  CROSS JOIN
  (
     SELECT STDEV(NumberField1) AS StdDev
     FROM CTE_Top1_PerPartition 
  ) AS StdDev
  GROUP BY StdDev.StdDev
)

-- Final Select
SELECT *
FROM CTE_Calculations

Every time I run the final select, my NumberField1_StdDev value changes even though SourceTable is isolated and doesn't change.

I noticed if I select CTE_Top1_PerPartition into a temp table first and then run the rest of the query off of that temp table, then I get the same result for NumberField1_StdDev every time.

I would guess it has something to do with the order the results are returned in the CTE_Top1_PerPartition CTE, but I'm not sure why.

Was it helpful?

Solution

row_number is not deterministic if there can be ties (i.e. rows with the same PartitionField and DateField values). Any of the tied values might end up with a PartitionRowId of 1 which would presumably change the final result.

You could use rank instead of row_number but that would cause you to consider all the tied rows which may not be what you want. You could also update your analytic function to add additional order by criteria to ensure that row_number returns a deterministic first row.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top