Postgresql Union: Do not repeat based on 1 column only
-
17-03-2021 - |
题
UNION filters out duplicate entries while UNION ALL keeps duplicate. Similar to that scenario, Consider this anological case, I want to set union to check only single column duplicacy.
SELECT id, 1 AS category FROM users UNION SELECT id, 2 FROM users_2;
Here I want to reject all UNION entries from id, 2
and check duplicacy based only on id.
OUTCOME:
id | category
------------------
100 | 1
101 | 1
... | ...
100 | 2 # Skip this as 100 is already present
201 | 2
....
EXPECTED:
id | category
----------------
100 | 1
101 | 1
... | ...
201 | 2
解决方案
full outer join
is very handy for this as follows:
select id, case when users.id is null then 2 else 1 end as category
from users
full outer join users_2 using (id);
How this works is that the id
column will be null
if the row comes from users
while it won't be null when it's coming from users_2
.
Alternatively you can do it with union
(and use union all
because we're already using not exists
to discard duplicates):
select id, 1 as category
from users
union all
select id, 2
from users_2 where not exists (select id from users where id = users_2.id);
but as you can see the query becomes much longer.
A third way that is even longer and probably even less efficient is to use distinct on
:
select distinct on (id) id, category
from (
select id, 1 as category
from users
union all
select id, 2
from users_2
) x
order by id, category;
NB This last query does guarantee the order in the output; the first two queries don't!
See https://dbfiddle.uk/?rdbms=postgres_13&fiddle=dd8bdbc82edc26cce9e8ba9fdde84517