Question

I guess this must be a fairly common problem, but I'm struggling to find an answer.

If I have three tables thus:

__POST__

Post_ID    Other_stuff
1
2
3
4
...

__TAG__

Tag_ID    Tag_name
1         MySQL
2         TSQL
3         PGSQL
4         PHP
5         Java
6         VB.NET

__CATEGORY__

Cat_Id    Cat_Description
1         IIS7
2         Apache
3         Oracle
4         NodeJS


__POST_TAG__

Post_ID    Tag_Id
1          2
1          6
2          1
2          4
3          4
3          5
3          1
4          1

__POST_CATEGORY__

Post_ID    Cat_Id
1          1
2          1
2          2
3          1
3          2

Is it possible to generate a MySQL query to return only posts tagged with the specified multiple tags and categories?

e.g. in my front end interface, the user would choose "MySQL" and "PHP" tags and "IIS7" and "Apache" categories and my query would return posts 2 and 3 but not the others. The user could potentially choose none or more of each option.

The closest I can get is

SELECT
distinct p.* 
FROM posts p 
INNER JOIN post_tag pt on p.Post_Id = pt.Post_Id
INNER JOIN post_category pc on p.Post_Id = pc.Post_Id
WHERE pt.tag_id IN(1,4)
AND ct.cat_id IN(2,3);

But this results in an OR query where posts with either MySQL or PHP tags plus either Apache or IIS7 are returned, whereas I need a match on all the entered values.

At the moment I'm trying to resist storing tags and categories as a comma delimited string as that seems like very bad practice from a normalisation point of view, but at least that way I could use

AND tag like '%MySQL%' AND tag like '%PHP%' AND cat LIKE '%APACHE%'...etc

But this seems like a very unsatisfactory solution. Can anyone help me with a better one?

Was it helpful?

Solution

I haven't used MySQL in awhile, but this is kind of a general problem.

The IN (...) clause is an OR based system. You've discovered that.

You can use EXISTS subqueries, which are much faster than IN () subqueries (supposedly close to join speeds), like so:

SELECT p.* 
FROM posts p 
WHERE EXISTS (SELECT 1 FROM post_tag pt WHERE p.Post_Id = pt.Post_Id AND pt.tag_id = 1)
    AND EXISTS (SELECT 1 FROM post_tag pt WHERE p.Post_Id = pt.Post_Id AND pt.tag_id = 4)
    AND EXISTS (SELECT 1 FROM post_category pc WHERE p.Post_Id = pc.Post_Id AND pc.tag_id = 2)
    AND EXISTS (SELECT 1 FROM post_category pc WHERE p.Post_Id = pc.Post_Id AND pc.tag_id = 3);

Notice that you don't need to join if you don't also need the category or tag information returned.

Another option is to use INTERSECT, but that's more complicated for the query engine, typically, and probably ends up about as performant as an IN () clause with a subquery.

OTHER TIPS

Putting the filters in the JOIN clauses does the trick:

SELECT p.*
FROM posts p 
INNER JOIN post_tag pt1 on p.Post_Id = pt1.Post_Id AND pt1.tag_id = 1
INNER JOIN post_tag pt2 on p.Post_Id = pt2.Post_Id AND pt2.tag_id = 4
INNER JOIN post_category pc1 on p.Post_Id = pc1.Post_Id AND pc1.cat_id = 2
INNER JOIN post_category pc2 on p.Post_Id = pc2.Post_Id AND pc2.cat_id = 3

Which could also be written like this:

SELECT p.*
FROM posts p 
INNER JOIN post_tag pt1 on p.Post_Id = pt1.Post_Id
INNER JOIN post_tag pt2 on p.Post_Id = pt2.Post_Id
INNER JOIN post_category pc1 on p.Post_Id = pc1.Post_Id
INNER JOIN post_category pc2 on p.Post_Id = pc2.Post_Id
WHERE pt1.tag_id = 1
  AND pt2.tag_id = 4
  AND pc1.cat_id = 2
  AND pc2.cat_id = 3
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top