PostgreSQL find groups with same members

https://dba.stackexchange.com/questions/190752

10-10-2020
|

Вопрос

Given the following schema:

User
  id

Group
  id

Membership
  group_id
  user_id

(With a unique index on group_id, user_id)

How would I select the group(s) of which the same two users are a member?

What I have now:

SELECT   "Membership"."group_id" 
FROM     "Membership" 
WHERE    "Membership"."user_id" IN (1, 2) 
GROUP BY "Membership"."group_id" 
HAVING   COUNT("Membership"."group_id") >= 2

Which seems to work, (it only retuns the group ids of which both users are a member), but looks rather confusing.

Is there a better way to express this query?

EDIT

So given the following data:

Users
-----
id
1
2
3

Group
-----
id
1
2
3

Membership
----------
group_id user_id
1        1
1        2
2        1
2        3
3        2
3        3

I'd like to find groups of which both user 1 and 2 are a member.

So the result of the query with user_id 1 and 2 should be group 1. The same query for user_id 1 and 3 should return group 2. And the query for user_id 2 and 3 should return group 3.

Решение

This is called relational division and your query is pretty much what you can do with standard SQL.

A different way to write this in Postgres would be to use arrays:

select group_id
from membership
group by group_id
having array_agg(user_id order by user_id) = array[1,3];

It's important to also order the IDs inside the array[1,3] constant because array[1,3] is a different array than array[3,1];

The above returns those groups that consist exactly of those two members. Your query returns those that consists at least of those two members. That can be written with an array as well:

select group_id
from membership
group by group_id
having array_agg(user_id order by user_id) @> array[1,3];

That uses the "contains" operator for arrays which tests if the array on the left side contains all elements of the array on the right side. But the array on the left side is allowed to have more elements.

The check for an exact match is probably faster using the array (because it would require an additional sub-query with standard SQL), but I don't think that the "at least" query makes a big difference in performance.

Другие советы

Your query is one standard way to express:

FORALL x:p(x)

in SQL. The other common way is to use De Morgan laws for predicate logic, and rewrite this predicate to:

NOT EXISTS x:NOT p(x)

Usually, people find it easier to understand the query you already have than this rewrite. If you add a comment in the code what your query does, I don't think it is a bad query at all.

A less efficient (usually) but perhaps easier to grasp solution if you only have a few users to check, is to use one leg for each user in an intersect:

select m1.group_id from membership m1 where m1.user_id = 1 
intersect 
select m1.group_id from membership m1 where m1.user_id = 2

for 3 users that would be:

select m1.group_id from membership m1 where m1.user_id = 1 
intersect 
select m1.group_id from membership m1 where m1.user_id = 2 
intersect 
select m1.group_id from membership m1 where m1.user_id = 3

That said, I would aim for what you already have or the solution provided by @a_horse_with_no_name

Лицензировано под: CC-BY-SA с атрибуция

Не связан с dba.stackexchange