SQL query that gives distinct results that match multiple columns
Question
Sorry, I couldn't provide a better title for my problem as I am quite new to SQL. I am looking for a SQL query string that solves the below problem.
Let's assume the following table:
DOCUMENT_ID | TAG ---------------------------- 1 | tag1 1 | tag2 1 | tag3 2 | tag2 3 | tag1 3 | tag2 4 | tag1 5 | tag3
Now I want to select all distinct document id's that contain one or more tags (but those must provide all specified tags). For example: Select all document_id's with tag1 and tag2 would return 1 and 3 (but not 4 for example as it doesn't have tag2).
What would be the best way to do that?
Regards, Kai
Solution
SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2
Edit:
Updated for lack of constraints...
OTHER TIPS
This assumes DocumentID and Tag are the Primary Key.
Edit: Changed HAVING clause to count DISTINCT tags. That way it doesn't matter what the primary key is.
Test Data
-- Populate Test Data
CREATE TABLE #table (
DocumentID varchar(8) NOT NULL,
Tag varchar(8) NOT NULL
)
INSERT INTO #table VALUES ('1','tag1')
INSERT INTO #table VALUES ('1','tag2')
INSERT INTO #table VALUES ('1','tag3')
INSERT INTO #table VALUES ('2','tag2')
INSERT INTO #table VALUES ('3','tag1')
INSERT INTO #table VALUES ('3','tag2')
INSERT INTO #table VALUES ('4','tag1')
INSERT INTO #table VALUES ('5','tag3')
INSERT INTO #table VALUES ('3','tag2') -- Edit: test duplicate tags
Query
-- Return Results
SELECT DocumentID FROM #table
WHERE Tag IN ('tag1','tag2')
GROUP BY DocumentID
HAVING COUNT(DISTINCT Tag) = 2
Results
DocumentID
----------
1
3
select DOCUMENT_ID
TAG in ("tag1", "tag2", ... "tagN")
group by DOCUMENT_ID
having count(*) > N and
Adjust N and the tag list as needed.
Select distinct document_id
from {TABLE}
where tag in ('tag1','tag2')
group by id
having count(tag) >=2
How you generate the list of tags in the where clause depends on your application structure. If you are dynamically generating the query as part of your code then you might simply construct the query as a big dynamically generated string.
We always used stored procedures to query the data. In that case, we pass in the list of tags as an XML document. - a procedure like that might look something like one of these where the input argument would be
<tags>
<tag>tag1</tag>
<tag>tag2</tag>
</tags>
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN
declare @tagCount int
select @tagCount = count(distinct *) from @tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
JOIN @tagList.nodes('tags/tag') R(tags) ON {TABLE}.tag = tags.value('.','varchar(20)')
group by id
having count(distict tag) >= @tagCount
END
OR
CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN
declare @tagCount int
select @tagCount = count(*) from @tagList.nodes('tags/tag') R(tags)
SELECT DISTINCT documentid
FROM {TABLE}
WHERE tag in
(
SELECT tags.value('.','varchar(20)')
FROM @tagList.nodes('tags/tag') R(tags)
}
group by id
having count( distinct tag) >= @tagCount
END
END