Evaluate different EXPLAINs on Redshift
Question
I'm trying to understand EXPLAIN on Redshift. I'm in a case were I have data like this
id | user_id | created_at
---|---------------------------------
1 | 1 | 2017-02-08 14:32:10.96
2 | 1 | 2017-02-07 14:32:10.96
3 | 2 | 2017-02-06 14:32:10.96
4 | 2 | 2017-02-05 14:32:10.96
I want :
id | user_id | created_at
---|---------------------------------
1 | 1 | 2017-02-08 14:32:10.96
3 | 2 | 2017-02-06 14:32:10.96
I have this two queries :
SELECT id,
user_id,
created_at
FROM
( SELECT user_id,
created_at,
row_number() OVER (PARTITION BY user_id
ORDER BY created_at) AS rownum
FROM my_table) x
WHERE rownum = 1;
With the EXPLAIN have :
XN Subquery Scan x (cost=1000001263779.68..1000001513986.60 rows=50042 width=16)
Filter: (rownum = 1)
-> XN Window (cost=1000001263779.68..1000001388883.14 rows=10008277 width=16)
Partition: user_id
Order: created_at
-> XN Sort (cost=1000001263779.68..1000001288800.37 rows=10008277 width=16)
Sort Key: user_id, created_at
-> XN Seq Scan on my_table (cost=0.00..100082.77 rows=10008277 width=16)
Then the other query :
SELECT ac1.user_id, ac1.created_at FROM my_table ac1
JOIN
(
SELECT user_id, MAX(created_at) AS MAXDATE
FROM my_table
GROUP BY user_id
) ac2
ON ac1.user_id = ac2.user_id
AND ac1.created_at = ac2.MAXDATE;
And the EXPLAIN :
XN Hash Join DS_DIST_NONE (cost=150798.74..771939079.62 rows=7257 width=16)
Hash Cond: (("outer".created_at = "inner".maxdate) AND ("outer".user_id = "inner".user_id))
-> XN Seq Scan on my_table ac1 (cost=0.00..100082.77 rows=10008277 width=16)
-> XN Hash (cost=150606.01..150606.01 rows=38548 width=16)
-> XN Subquery Scan ac2 (cost=150124.15..150606.01 rows=38548 width=16)
-> XN HashAggregate (cost=150124.15..150220.52 rows=38548 width=16)
-> XN Seq Scan on my_table (cost=0.00..100082.77 rows=10008277 width=16)
Result are a little bit slower with the first query but I'm lost when I try to understand EXPLAINs.
It seems that the cost
is higher on the query that use ROW_NUMBER()
but that's it same with rows
.
But what can I extract from those EXPLAINs (I sadly cannot use ANALYZE
on Redshift)?
Solution
The step in the first query plan that is costly and explains the difference is the sort step on the large number of rows. You are sorting the entire dataset (an O(n log n)
operation, where n is your partition size) so you can then select the first entry. The other rows (#2 - #10,000,000) still had to be sorted even though you never looked at them. On the other hand, max is an O(n)
operation because you only ever have to keep track of one value as you pass through the data
OTHER TIPS
Fabio has given a great answer.
However for Redshift, it is worth adding that how your data is physically laid out has a dramatic impact on EXPLAIN plan cost.
Create some dummy data:
(inspired by https://stackoverflow.com/questions/38667215/redshift-how-can-i-generate-a-series-of-numbers-without-creating-a-table-called)
drop table if exists #my_table;
create table #my_table as
select
(row_number() over (order by 1)) - 1 as user_id
,(current_date - user_id::int)::timestamp created_at
from
stl_load_errors
limit 24
Reproduce similar EXPLAIN plans to the question:
explain
SELECT user_id,
created_at
FROM
( SELECT user_id,
created_at,
row_number() OVER (PARTITION BY user_id
ORDER BY created_at) AS rownum
FROM #my_table) x
WHERE rownum = 1
EXPLAIN:
XN Subquery Scan x (cost=1000000000000.79..1000000000001.39 rows=1 width=16)
Filter: (rownum = 1)
-> XN Window (cost=1000000000000.79..1000000000001.09 rows=24 width=16)
Partition: user_id
Order: created_at
-> XN Sort (cost=1000000000000.79..1000000000000.85 rows=24 width=16)
Sort Key: user_id, created_at
-> XN Network (cost=0.00..0.24 rows=24 width=16)
Distribute
-> XN Seq Scan on "#my_table" (cost=0.00..0.24 rows=24 width=16)
Next:
explain
SELECT ac1.user_id, ac1.created_at FROM #my_table ac1
JOIN
(
SELECT user_id, MAX(created_at) AS MAXDATE
FROM #my_table
GROUP BY user_id
) ac2
ON ac1.user_id = ac2.user_id
AND ac1.created_at = ac2.MAXDATE
EXPLAIN:
XN Hash Join DS_DIST_INNER (cost=0.72..822858.77 rows=13 width=16)
Inner Dist Key: ac1.user_id
Hash Cond: (("outer".maxdate = "inner".created_at) AND ("outer".user_id = "inner".user_id))
-> XN Subquery Scan ac2 (cost=0.36..0.66 rows=24 width=16)
-> XN HashAggregate (cost=0.36..0.42 rows=24 width=16)
-> XN Seq Scan on "#my_table" (cost=0.00..0.24 rows=24 width=16)
-> XN Hash (cost=0.24..0.24 rows=24 width=16)
-> XN Seq Scan on "#my_table" ac1 (cost=0.00..0.24 rows=24 width=16)
This is indeed much faster but data is distributed between nodes (DB_DIST_INNER).
Now try:
drop table #my_table_dist;
create table #my_table_dist
distkey(user_id) sortkey(user_id,created_at) as
select
(row_number() over (order by 1)) - 1 as user_id
,(current_date - user_id::int)::timestamp created_at
from
stl_load_errors
limit 24
Now run EXPLAIN:
explain
SELECT user_id,
created_at
FROM
( SELECT user_id,
created_at,
row_number() OVER (PARTITION BY user_id
ORDER BY created_at) AS rownum
FROM #my_table_dist) x
WHERE rownum = 1
EXPLAIN:
XN Subquery Scan x (cost=0.00..0.78 rows=1 width=16)
Filter: (rownum = 1)
-> XN Window (cost=0.00..0.48 rows=24 width=16)
Partition: user_id
Order: created_at
-> XN Seq Scan on "#my_table_dist" (cost=0.00..0.24 rows=24 width=16)
The data is already sorted and distributed so Redshift simply reads off the answer.
Whilst:
explain
SELECT ac1.user_id, ac1.created_at FROM #my_table ac1
JOIN
(
SELECT user_id, MAX(created_at) AS MAXDATE
FROM #my_table_dist
GROUP BY user_id
) ac2
ON ac1.user_id = ac2.user_id
AND ac1.created_at = ac2.MAXDATE
EXPLAIN:
XN Hash Join DS_DIST_INNER (cost=0.36..822858.77 rows=13 width=16)
Inner Dist Key: ac1.user_id
Hash Cond: (("outer".maxdate = "inner".created_at) AND ("outer".user_id = "inner".user_id))
-> XN Subquery Scan ac2 (cost=0.00..0.66 rows=24 width=16)
-> XN GroupAggregate (cost=0.00..0.42 rows=24 width=16)
-> XN Seq Scan on "#my_table_dist" (cost=0.00..0.24 rows=24 width=16)
-> XN Hash (cost=0.24..0.24 rows=24 width=16)
-> XN Seq Scan on "#my_table" ac1 (cost=0.00..0.24 rows=24 width=16)
Note no difference in cost due to distribution of data between nodes (DB_DIST_INNER).