What does it mean to have multiple sortkey columns?

Question 1

If you declare SORTKEY(COL1, COL2), all columns will be sorted by COL1, then COL2 as if ORDER BY (COL1, COL2) was done.

If you are using SORTKEY to speed up a JOIN, AFAIU it doesn't matter so long as you use the same SORTKEY on the tables that will be joined because what happens is a merge join.

If COL1 is highly selective like your type_id, it means there are only small numbers of rows which has the same type_id. Therefore although you can add another column to SORTKEY, its utility is limited because most of the row elimination has already happened.

If COL1 is not highly selective like your stamp (which is a bit weird btw; I would have expected it to be more selective than type_id? Anyways..), it means that filtering by stamp won't eliminate that much rows. So it makes more sense to declare a second sort key. However, this is less efficient than the other way around as eliminating rows earlier would be cheaper. If you sometimes filter by stamp but not by type_id, it may make sense to do this though.

Question 2

We are also using Redshift and we have about 2 billion records (+20 million every day) and I have to say, the less selective the sort_key is, the more ahead it should be in the sort_key list.

In our case (and please be advised to analyze how you use/query your own data) we used timestamp as first sort_key. The problem with this is, that even within 1 second we record about 200 rows, which results our 1MB blocks contain only a few seconds, and every type of data in that single block. Meaning, even though timestamp is highly selective, after we cannot really filter further as we have all kinds of data in every block.

Recently we have reversed the order of the sort_keys. The first one has about 15 different values, the second has about 30, etc... and timestamp is the last one now, but still, one block is still measured in seconds.

This results, (since we use the first two sort_keys as filters very frequently) the following: Old solution: A year of data, select a month, it drops 91% of the blocks, but after it has to open all of them, even though we want to filter further.

The new solution drops about 14/15 of the blocks in the first step, regardless of the date range, then about 95% of the remaining ones, and timestamp still drops 91% of the remaining ones.

We have tested it thoroughly with two, 800 million records tables, which were the same, except the order of the sort keys. The higher the time-period in the 'where' clause was, the better results we got. It got even more significant in case of joins obviously.

So my suggestion is, know your database and what kind of queries you run frequently, because the most selective column might not be the best first sort_key. Just as Enno Shioji said, it all depends on by what you are filtering.

Question 3

I will say the order for sort_key should be

consider those in dist, filter and join first
consider those in filter, join
consider those in filter
consider those in join
consider those in group by, order by (including window function)

the general rule: lower cardinality put first if same level.