Understanding aggregate window functions
-
21-02-2021 - |
سؤال
Consider the following table:
CREATE TABLE T1
(
keycol INT NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
col1 VARCHAR(10) NOT NULL
);
INSERT INTO T1 VALUES
(2, 'A'),(3, 'A'),
(5, 'B'),(7, 'B'),(11, 'B'),
(13, 'C'),(17, 'C'),(19, 'C'),(23, 'C');
Currently, I am looking into window functions and am trying out aggregate window functions. Although I feel I understand how the windows are defined with the OVER
and PARITION
clauses, I am unsure how the aggregate window functions are being calculated, such as AVG() OVER ()
.
I am looking to understand the following three queries.
-- Query 1
SELECT keycol, col1, AVG(keycol) OVER (PARTITION BY col1) AS RowAvg
FROM T1
keycol | col1 | RowAvg -----: | :--- | -----: 2 | A | 2 3 | A | 2 5 | B | 7 7 | B | 7 11 | B | 7 13 | C | 18 17 | C | 18 19 | C | 18 23 | C | 18
-- Query 2
SELECT keycol, col1, AVG(keycol) OVER (ORDER BY keycol) AS RowAvg
FROM T1
keycol | col1 | RowAvg -----: | :--- | -----: 2 | A | 2 3 | A | 2 5 | B | 3 7 | B | 4 11 | B | 5 13 | C | 6 17 | C | 8 19 | C | 9 23 | C | 11
-- Query 3
SELECT keycol, col1, AVG(keycol) OVER (PARTITION BY col1 ORDER BY keycol) AS RowAvg
FROM T1
keycol | col1 | RowAvg -----: | :--- | -----: 2 | A | 2 3 | A | 2 5 | B | 5 7 | B | 6 11 | B | 7 13 | C | 13 17 | C | 15 19 | C | 16 23 | C | 18
Query 1: I believe RowAvg should be the average of the rows for each col1 level. Are the numbers 2 and 7 the FLOOR of the average or is my understanding incorrect?
Query 2: I am not too sure what is being done to produce RowAvg here. As no PARTITION or framing is used here, I believe the window should be the entire table, is this correct? Also, how is the RowAvg being found?
Query 3: Is this finding the (FLOOR) average for each partition however doing this incrementally? That is, for row 1 of the first partition ('A'), we find the average of that row. Then, for row 2 of the first partition, we find the average of the first 2 rows.
General question: Does introducing ORDER BY
into the aggregate window function perform the aggregate function 'consecutively' such as in queries 1 and 2? It is interesting to see that in query 1, AVG
is performed to each partition as a whole, whereas in queries 1 and 2 the RowAvg's are almost different for each row.
المحلول
I suggest you add a sum to understand what is going on:
SELECT col1
, keycol
, SUM(keycol) OVER (PARTITION BY col1) AS mysum
, AVG(keycol) OVER (PARTITION BY col1) AS RowAvg
FROM T1
order by col1;
Since you don't have an order by in your window, the aggregate applies to the whole partition:
col1 keycol mysum RowAvg
A 2 5 2
A 3 5 2
B 5 23 7
B 7 23 7
B 11 23 7
C 13 72 18
C 17 72 18
C 19 72 18
C 23 72 18
I.e. for partition A, mysum = 2+3 for every row in the partition
If you use an ORDER by clause the aggregate is applied from the beginning to the current row:
SELECT keycol
, col1
, SUM(keycol) OVER (ORDER BY keycol) AS mysum
, AVG(keycol) OVER (ORDER BY keycol) AS RowAvg
FROM T1
order by col1;
since you don't have a partition, the whole result set is treated as 1 partition:
keycol col1 mysum RowAvg
2 A 2 2
3 A 5 2
5 B 10 3
7 B 17 4
11 B 28 5
...
For first row (according to order by) mysum = 2, rowavg = 2/1 , second row mysum = 2+3, rowavg = 5/2 , third row mysum = 2+3+5 rowavg = 10/3 ...
As you can see the sum(...) becomes a cumulative sum
With both a partition and order, the aggregate applies to each partition, but with the behaviour described above:
SELECT keycol
, col1
, SUM(keycol) OVER (PARTITION BY col1 ORDER BY keycol) AS mysum
, AVG(keycol) OVER (PARTITION BY col1 ORDER BY keycol) AS RowAvg
FROM T1
order by col1;
keycol col1 mysum RowAvg
2 A 2 2
3 A 5 2
5 B 5 5
7 B 12 6
...
For A you get mysum 2, 2+3. For B it restarts so it becomes 5, 5+7,
In addition you can override the default behavior which is:
SELECT keycol
, col1
, SUM(keycol) OVER (PARTITION BY col1
ORDER BY keycol
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ...
FROM T1
order by col1;
by declaring your own, for example if you want a sliding average over 3 rows:
SELECT keycol
, col1
, AVG(keycol) OVER (PARTITION BY col1
ORDER BY keycol
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ...
...
13 C 15
17 C 16
19 C 19
23 C 21
The first row gets an average of (13+17)/2 (since there is no preceding row), the second (13+17+19)/3, ... and the fourth row becomes (19+23)/2 (no rows following)