I have the following tables: users
, tags
, tags_data
.
tags_data
contains tag_id
and user_id
columns to link the users
with tags
in a 1 user to many tags relationship.
What is the best way of listing all users that have either tag_id
1001 AND 1003, OR tag_id
1004?
EDIT: By this I mean there could be other related tags as well, or not, just so long as there is definitely either 1004 OR (1001 AND 1003).
At the moment I've got two methods of doing this, both using a UNION
in a derived table, either in the FROM
clause or in an INNER JOIN
clause...
SELECT subsel.user_id, users.name
FROM ( SELECT user_id
FROM tags_data
WHERE tag_id IN (1001, 1003)
GROUP BY user_id
HAVING COUNT(tag_id)=2
UNION
SELECT user_id
FROM tags_data
WHERE tag_id=1004
) AS subsel
LEFT JOIN users ON subsel.user_id=users.user_id
Or
SELECT users.user_id, users.name
FROM users
INNER JOIN ( SELECT user_id
FROM tags_data
WHERE tag_id IN (1001, 1003)
GROUP BY user_id
HAVING COUNT(tag_id)=2
UNION
SELECT user_id
FROM tags_data
WHERE tag_id=1004
) AS subsel ON users.user_id=subsel.user_id
There are other tables which I'll be LEFT JOIN
ing on to this. 50k+ rows in the users
table and 150k+ rows in the tags_data
table.
This is a batch job to export data to another system so not a real-time query run by an end user, so performance isn't massively critical. However I'd like to try and get the best result I can. The query for the derived table should actually be pretty fast and it makes sense to narrow the scope of the result set down before I then add further joins, functions and calculated fields to the results returned to the client. I will be running these on a larger dataset later to see if there is any performance difference but running EXPLAIN
shows an almost identical execution plan.
Generally I try and avoid UNIONs
unless absolutely necessary. But I think in this case I almost have to have a UNION
somewhere by definition, because of the two effectively unrelated criteria.
Is there another method that I could be using here?
And is there some sort of specific database terminology for this sort of problem?
Full example schema:
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tag_name` varchar(255) NOT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1006 ;
INSERT INTO `tags` (`tag_id`, `tag_name`) VALUES
(1001, 'tag1001'),
(1002, 'tag1002'),
(1003, 'tag1003'),
(1004, 'tag1004'),
(1005, 'tag1005');
CREATE TABLE IF NOT EXISTS `tags_data` (
`tags_data_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`tags_data_id`),
KEY `user_id` (`user_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
INSERT INTO `tags_data` (`tags_data_id`, `user_id`, `tag_id`) VALUES
(1, 1, 1001),
(2, 1, 1002),
(3, 1, 1003),
(4, 5, 1001),
(5, 5, 1003),
(6, 5, 1005),
(7, 8, 1004),
(8, 9, 1001),
(9, 9, 1002),
(10, 9, 1004);
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
INSERT INTO `users` (`user_id`, `name`) VALUES
(1, 'user1'),
(2, 'user2'),
(3, 'user3'),
(4, 'user4'),
(5, 'user5'),
(6, 'user6'),
(7, 'user7'),
(8, 'user8'),
(9, 'user9'),
(10, 'user10');