Question

I have one table that I'm using to build groups with in my database. The table contains a list of group names and ids. I have another table that has users, and a third table showing the relationships. (userid, groupid).

The situation is this, I need to create a list of userids that belong to a specific subset of groups. So for instance, I want all users that are in group 1, 3, and 8. That is straight forward enough. It gets more complicated though, I may need a list of all users that are in groups 1, 3, and 8, or 1, 2, and 8. Then I might need to exclude users that fit that criteria, but are also in group 27.

So I've got a script dynamically creating a query, using sub queries that works to a point. I have two problems with it. I don't think I'm handling the not-in part properly, because as I ad criteria, eventually, it just kinda hangs. (I think this is a result of me using sub-selects instead of joins, but I could not figure out how to build this with joins.)

Here is an example of a query with 4 ANDed OR groups, and 2 NOT clauses.

Please let me know if there is a better way to optimize this stmt. (I can handle the dynamic building of it in PHP)

If I need to clarify anything or provide more details, let me know.


select * from users_table where username IN
(
    select user_id from
    (
        select distinct user_id from group_user_map where user_id in 
        (
            select user_id from 
            (
                select * from 
                (
                    select count(*) as counter, user_id from  
                    (
                        (
                            select distinct(user_id) from group_user_map where group_id in (2601,119)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (58,226)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (1299,525)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (2524,128)
                        ) 
                    ) 
                    thegroups group by user_id
                ) 
                getall where counter = 4
            ) 
            getuserids
        ) 
        and user_id not in 
        (
            select user_id from group_user_map where group_id in (2572)
        ) 
    ) 
    biggergroup 
);

Note, the first part of the query is comparing an id to a username. This is because I have the usernames stored as id's from the other table. (This whole thing is a link between two completely different databases).

(Also, if it looks like I have any extra sub-queries, that was to try to force mysql to evaluate the inner queries first.)

Thanks.

Aaron.

Was it helpful?

Solution

Avoiding subselects used for IN clauses:-

SELECT * 
FROM users_table
INNER JOIN 
(
    SELECT Sub1.user_id 
    FROM (
            SELECT COUNT(*) AS counter, user_id   
            FROM (
                SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)
                UNION ALL
                SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)
                UNION ALL
                SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)
                UNION ALL
                SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)
            ) thegroups
            GROUP BY user_id
            HAVING counter = 4
    ) Sub1
    LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
    ON group_user_map.user_id = Sub2.user_id
    WHERE Sub2.user_id IS NULL
) Sub3
ON  users_table.username = Sub3.user_id

Or avoiding using the COUNTs to check that the user id exists in all 4 tables, instead using inner joins

SELECT * 
FROM users_table
INNER JOIN 
(
    SELECT Sub1.user_id 
    FROM (
        SELECT z.user_id   
        FROM (
            SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
            INNER JOIN
            (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON z.user_id = y.user_id
            INNER JOIN
            (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON z.user_id = x.user_id
            INNER JOIN
            (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON z.user_id = w.user_id
    ) Sub1
    LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
    ON group_user_map.user_id = Sub2.user_id
    WHERE Sub2.user_id IS NULL
) Sub3
ON  users_table.username = Sub3.user_id

Cleaning up that 2nd query a bit

SELECT * 
FROM users_table
INNER JOIN 
(
    SELECT z.user_id   
    FROM (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y 
    ON z.user_id = y.user_id
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x 
    ON z.user_id = x.user_id
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w 
    ON z.user_id = w.user_id
    LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
    ON z.user_id = Sub2.user_id
    WHERE Sub2.user_id IS NULL
) Sub3
ON  users_table.username = Sub3.user_id

Using your SQL in the comment below, it can be cleaned up to :-

select SQL_NO_CACHE id 
from users_table 
INNER JOIN ( SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (0, 67) ) ij1 
ON users_table.username = ij1.user_id 
LEFT OUTER JOIN ( SELECT user_id FROM group_user_map WHERE group_id IN (0) ) Sub2 
ON users_table.username = Sub2.user_id 
WHERE Sub2.user_id IS NULL 

Cleaning up my SQL in the same way:-

SELECT users_table.* 
FROM users_table
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z ON users_table.username = z.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON users_table.username = y.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON users_table.username = x.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON users_table.username = w.user_id
LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2 ON users_table.username = Sub2.user_id
WHERE Sub2.user_id IS NULL

Removing the subselects and doing the joins directly (might help or hinder, suspect it will depend on how many duplicate user_id records there are for each set of group_id records)

SELECT DISTINCT users_table.* 
FROM users_table
INNER JOIN group_user_map z ON users_table.username = z.user_id AND z.group_id IN (2601,119)
INNER JOIN group_user_map y ON users_table.username = y.user_id AND y.group_id IN (58,226)
INNER JOIN group_user_map x ON users_table.username = x.user_id AND x.group_id IN (1299,525)
INNER JOIN group_user_map w ON users_table.username = w.user_id AND w.group_id IN (2524,128)
LEFT OUTER JOIN group_user_map Sub2 ON users_table.username = Sub2.user_id AND Sub2.group_id IN (2572)
WHERE Sub2.user_id IS NULL

OTHER TIPS

It would be easier to understand your problem if you post the table structure and some sample data. But here are a few suggestions based on your current query that you might be able to use.

These queries reduce the number of subqueries that you are using. One of the obvious changes is the difference in the way it gets the list of user_id's with each group:

select user_id
from group_user_map 
where group_id in (2601,119)
union all
select user_id 
from group_user_map 
where group_id in (58,226)
union all
select user_id 
from group_user_map 
where group_id in (1299,525)
union all
select user_id 
from group_user_map 
where group_id in (2524,128);

This uses a UNION ALL which will list all of the user_id even if they are duplicated. Once you have this list of user_id's then you get the count by applying a count(distinct user_id) and use a HAVING clause to find those that have 4 occurrences.

First, you could consolidate your current query to the following version in a WHERE clause:

select * 
from users_table 
where username IN (select user_id
                  from
                  (
                    select user_id
                    from group_user_map 
                    where group_id in (2601,119)
                    union all
                    select user_id 
                    from group_user_map 
                    where group_id in (58,226)
                    union all
                    select user_id 
                    from group_user_map 
                    where group_id in (1299,525)
                    union all
                    select user_id 
                    from group_user_map 
                    where group_id in (2524,128)
                  ) thegroups
                  where user_id not in (select user_id 
                                        from group_user_map 
                                        where group_id in (2572)) 
                  group by userid
                  having count(distinct userid) = 4);

Or you could use the query in the WHERE clause in a subquery that you JOIN to:

select ut.* 
from users_table ut
inner join
(
  select user_id
  from
  (
    select user_id
    from group_user_map 
    where group_id in (2601,119)
    union all
    select user_id 
    from group_user_map 
    where group_id in (58,226)
    union all
    select user_id 
    from group_user_map 
    where group_id in (1299,525)
    union all
    select user_id 
    from group_user_map 
    where group_id in (2524,128)
  ) thegroups
  where user_id not in (select user_id 
                        from group_user_map 
                        where group_id in (2572)) 
  group by userid
  having count(distinct userid) = 4
) biggergroup
  on ut.username = biggergroup.user_id;

It's not exactly clear what you mean when you say "I want all users that are in group 1, 3, and 8" and then write

select distinct(user_id) from group_user_map where group_id in (58,226)

because the English suggests you want a user who is in all three groups but the SQL gives you users who are in any 1 of the groups. So you need to be clearer about what exactly you want.

It's a little hard to believe that you are trying to find users that are in all of 4 supergroups with each supergroup being made of exactly 2 groups. It makes me question what you are doing and why.

There are a few different approaches I can think of depending on what you are really going to run into. Obviously the simplest is to break it into multiple queries and combine the results in your code. You can auto-join the group table if it's not too big, but it probably is too big to to join 3 times. You might get better performance with NOT EXISTS than with NOT IN but probably not. You can try to further leverage aggregation functions with CASE functions to compute success values in an intermediate table, but that's getting pretty crazy. More likely you'd be better off reworking your data structure.

The main problem I see with your existing solution is the large number of temporary tables you create. In general you are going to need a temporary table of some kind to do something this complicated so I would focus on limiting it to two tables, each of which is smaller than the relationships table.

Is this right query

  select * from users_table where username IN    
            (
(select distinct(user_id) from group_user_map where group_id in (2601,119)) a 
inner join
(select distinct(user_id) from group_user_map where group_id in (58,226)) b 
on a.user_id = b.user_id inner join 
(select distinct(user_id) from group_user_map where group_id in (1299,525)) c 
on a.user_id = c.user_id inner join 
(select distinct(user_id) from group_user_map where group_id in (2524,128)) d
on a.user_id = d.user_id 
)  and user_id  not in (select user_id from group_user_map where group_id in (2572))

Instead of union all and finally filter with having counter of 4, I replaced with intersect. Kindly check whether result is correct and it runs fast ?

Vinit

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top