Random row selection but with distinct values for a subset of columns
-
29-01-2021 - |
Question
I have a table that looks like this :
Key1 | Key2 | Key3 | Data |
-----|------|------|-------|
1 | 1 | 1 | a |
1 | 1 | 2 | b |
1 | 2 | 1 | c |
1 | 2 | 2 | d |
2 | 1 | 1 | e |
2 | 1 | 2 | f |
2 | 2 | 1 | g |
2 | 2 | 2 | h |
---------------------------|
Here is some code to create that example table :
CREATE TABLE Example(
Key1 int,
Key2 int,
Key3 int,
Data varchar(1));
INSERT INTO Example(Key1,Key2,Key3, Data)
VALUES (1,1,1,'a'),
(1, 1,2,'b'),
(1, 2,1,'c'),
(1, 2,2,'d'),
(2, 1,1,'e'),
(2, 1,2,'f'),
(2, 2,1,'g'),
(2, 2,2,'h');
I want to randomly select one whole row for each distinct pair of Key1 and Key2 values (one for the 1 / 1 pair, one for the 1 / 2 pair etc.) One possible result would be :
Key1 | Key2 | Key3 | Data |
-----|------|------|-------|
1 | 1 | 1 | a |
1 | 2 | 1 | c |
2 | 1 | 2 | f |
2 | 2 | 1 | g |
---------------------------|
I can do this doing multiple queries, by first selecting all the Key1 and Key2 distinct pairs and then using those pairs to run an other query like
SELECT stuff FROM table WHERE Key1 = value1 AND Key2 = value2 ORDER BY RAND() LIMIT 1`
But this gross approach needs to ask as many queries as existing pairs of Key1/Key2 and is taking forever since my table is huge.
I've read things about using subqueries, partition, group by, but I struggle to implement them.
I'm new to SQL and only need to use it for a specific project and I don't really have the time to learn it properly, so I would be very thankful if you guys could give me a hand.
Thanks
JC
Solution
MariaDB 10.3 supports window functions, so something like this would work:
select * from (
select
t.*, -- all columns
row_number() -- assign sequential numbers
over ( -- within a "window"
partition by k1, k2 -- determined by a unique combination of k1, k2
order by rand() -- while ordering rows randomly within the partition
) as rn -- set the column alias
from test t
) tt
where rn = 1 -- select only the first row from each "window" (partition)
order by k1, k2, k3