Question

Sorry, I couldn't provide a better title for my problem as I am quite new to SQL. I am looking for a SQL query string that solves the below problem.

Let's assume the following table:

DOCUMENT_ID |     TAG
----------------------------
   1        |   tag1
   1        |   tag2
   1        |   tag3
   2        |   tag2
   3        |   tag1
   3        |   tag2
   4        |   tag1
   5        |   tag3

Now I want to select all distinct document id's that contain one or more tags (but those must provide all specified tags). For example: Select all document_id's with tag1 and tag2 would return 1 and 3 (but not 4 for example as it doesn't have tag2).

What would be the best way to do that?

Regards, Kai

Was it helpful?

Solution

SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2

Edit:

Updated for lack of constraints...

OTHER TIPS

This assumes DocumentID and Tag are the Primary Key.

Edit: Changed HAVING clause to count DISTINCT tags. That way it doesn't matter what the primary key is.

Test Data

-- Populate Test Data
CREATE TABLE #table (
  DocumentID varchar(8) NOT NULL, 
  Tag varchar(8) NOT NULL
)

INSERT INTO #table VALUES ('1','tag1')
INSERT INTO #table VALUES ('1','tag2')
INSERT INTO #table VALUES ('1','tag3')
INSERT INTO #table VALUES ('2','tag2')
INSERT INTO #table VALUES ('3','tag1')
INSERT INTO #table VALUES ('3','tag2')
INSERT INTO #table VALUES ('4','tag1')
INSERT INTO #table VALUES ('5','tag3')

INSERT INTO #table VALUES ('3','tag2')  -- Edit: test duplicate tags

Query

-- Return Results
SELECT DocumentID FROM #table
WHERE Tag IN ('tag1','tag2')
GROUP BY DocumentID
HAVING COUNT(DISTINCT Tag) = 2

Results

DocumentID
----------
1
3
select DOCUMENT_ID
      TAG in ("tag1", "tag2", ... "tagN")
   group by DOCUMENT_ID
   having count(*) > N and 

Adjust N and the tag list as needed.

Select distinct document_id 
from {TABLE} 
where tag in ('tag1','tag2')
group by id 
having count(tag) >=2 

How you generate the list of tags in the where clause depends on your application structure. If you are dynamically generating the query as part of your code then you might simply construct the query as a big dynamically generated string.

We always used stored procedures to query the data. In that case, we pass in the list of tags as an XML document. - a procedure like that might look something like one of these where the input argument would be

<tags>
   <tag>tag1</tag>
   <tag>tag2</tag>
</tags>


CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN

declare @tagCount int
select @tagCount = count(distinct *) from @tagList.nodes('tags/tag') R(tags)


SELECT DISTINCT documentid
FROM {TABLE}
JOIN @tagList.nodes('tags/tag') R(tags) ON {TABLE}.tag = tags.value('.','varchar(20)')
group by id 
having count(distict tag) >= @tagCount 

END

OR

CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN

declare @tagCount int
select @tagCount = count(*) from @tagList.nodes('tags/tag') R(tags)


SELECT DISTINCT documentid
FROM {TABLE}
WHERE tag in
(
SELECT tags.value('.','varchar(20)') 
FROM @tagList.nodes('tags/tag') R(tags)
}
group by id 
having count( distinct tag) >= @tagCount 
END

END

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top