Postgresql Union: Do not repeat based on 1 column only

https://dba.stackexchange.com/questions/287023

17-03-2021
|

题

UNION filters out duplicate entries while UNION ALL keeps duplicate. Similar to that scenario, Consider this anological case, I want to set union to check only single column duplicacy.

SELECT id, 1 AS category FROM users UNION SELECT id, 2 FROM users_2;

Here I want to reject all UNION entries from id, 2 and check duplicacy based only on id.

OUTCOME:

id   |   category
------------------
100  |    1
101  |    1
...  |    ...
100  |    2   # Skip this as 100 is already present
201  |    2
....

EXPECTED:

id   | category
----------------
100  |   1
101  |   1
...  |  ...
201  |   2

解决方案

full outer join is very handy for this as follows:

select id, case when users.id is null then 2 else 1 end as category
from users
full outer join users_2 using (id);

How this works is that the id column will be null if the row comes from users while it won't be null when it's coming from users_2.

Alternatively you can do it with union (and use union all because we're already using not exists to discard duplicates):

select id, 1 as category
from users
union all
select id, 2
from users_2 where not exists (select id from users where id = users_2.id);

but as you can see the query becomes much longer.

A third way that is even longer and probably even less efficient is to use distinct on:

select distinct on (id) id, category
from (
  select id, 1 as category
  from users
  union all
  select id, 2
  from users_2
) x
order by id, category;

NB This last query does guarantee the order in the output; the first two queries don't!

See https://dbfiddle.uk/?rdbms=postgres_13&fiddle=dd8bdbc82edc26cce9e8ba9fdde84517

许可以下： CC-BY-SA 和归因

不隶属于 dba.stackexchange