Question

I have a few tables with some (example) values:

people
----------------------------------
| id    | name                   |
----------------------------------
| 1     | Steve Jobs             |
| 2     | Bill Gates             |
| 3     | Linus Torvalds         |
| 4     | Nikola Tesla           |
| 5     | Henry Ford             |
----------------------------------

.

groups
----------------------------------
| id    | name                   |
----------------------------------
| 1     | Country                |
| 2     | Work                   |
----------------------------------

.

categories
------------------------------------------
| id    | gid   | name                   |
------------------------------------------
| 1     | 1     | USA                    |
| 2     | 1     | Finland                |
| 3     | 1     | Croatia                |
| 4     | 2     | Mac                    |
| 5     | 2     | iPhone                 |
| 6     | 2     | iPad                   |
| 7     | 2     | Windows                |
| 8     | 1     | Office                 |
| 9     | 1     | Linux                  |
| 10    | 1     | Coil                   |
| 11    | 1     | El. stuff              |
| 12    | 1     | Cars                   |
------------------------------------------

.

people_categories_map
--------------------------
| id    | pid   | cid    |
--------------------------
| 1     | 1     | 1      |
| 2     | 1     | 4      |
| 3     | 1     | 5      |
| 4     | 1     | 6      |
| 5     | 1     | 11     |
| 6     | 2     | 1      |
| 7     | 2     | 7      |
| 8     | 2     | 8      |
| 9     | 2     | 11     |
| 10    | 3     | 2      |
| 11    | 3     | 9      |
| 12    | 3     | 11     |
| 13    | 4     | 1      |
| 14    | 4     | 3      |
| 15    | 4     | 10     |
| 16    | 4     | 11     |
| 17    | 5     | 1      |
| 18    | 5     | 12     |
--------------------------

The challenge

What I'm trying to achieve, is to list people based on matches from each group. For example, I want to list people from (country) USA (categories.id:1) OR Finland (categories.id:2) AND Work Linux (categories.id:9) OR Cars (categories.id:12). This query would match only Linus Torvalds (people.id:3) based on relations from the people_categories_map table.

However, so far I have only managed to make queries that either list all category matches or any category matches, i.e.:

SELECT people.id, people.name FROM people JOIN people_categories_map ON people_categories_map.pid = people.id WHERE people_categories_map.cid IN (1,2);

SELECT people.id, people.name FROM people JOIN people_categories_map ON people_categories_map.pid = people.id WHERE people_categories_map.cid IN (1,9) HAVING COUNT(DISTINCT(people_categories_map.cid)) = 2;

Is it possible to combine these somehow, to make a query that returns a list of people that would be connected to "(category_x OR category_y) AND (category_z OR category_w OR category_m) AND (category_b OR category_c)" and so on... in other words, to return matches from all groups?

Was it helpful?

Solution

As Alexandre in his comment said, your design is more than problematic.

Your example is faulty too: T

  1. The categories 8 - 12 should have the gid 2 not 1.
  2. The query should return "Henry Ford" too, because he lives in the USA and works on Cars.

Following not beautiful query with a lot of joins do it (I'm sure it could be optimized, but better to optimize the database design):

SELECT *
FROM people_categories_map pmc1
INNER JOIN people_categories_map pmc2
ON  pmc1.pid = pmc2.pid AND pmc1.cid <> pmc2.cid
INNER JOIN people p
ON pmc1.pid = p.id
INNER JOIN categories c1
ON pmc1.cid = c1.id
INNER JOIN categories c2
ON pmc2.cid = c2.id
INNER JOIN groups g1
ON c1.gid = g1.id
INNER JOIN groups g2
ON c2.gid = g2.id
WHERE
    c1.name IN ('USA', 'FINLAND')
AND
    c2.name IN ('Linux', 'Cars');

Explanation

First we do a self join of the mapping table, so we have access to both country and work of a person. It's not a real INNER JOIN, because our join condition contains an unequal-condition.

Then we join our people once, but categories and groups twice: first for country, secondly for work and filter so. You could swap those two, without losing or changing data. It's only a matter of definition.

Remark: This is not efficient, but I don't believe it is worth to optimize.


See a fiddle for the example

If you want a person only once, use

SELECT DISTINCT p.id, p.name
FROM ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top