How to create a faceted search with SQL Server

Question 1

There's other places where you can get examples of turning a CSV parameter into a table variable. Assuming you have done that part your query boils down to the following:

GetFacetedProducts: Find Product records where all tags passed in are assigned to each product.

If you wrote it by hand you could end up with:

SELECT P.*
FROM Product P
INNER JOIN ProductTag PT1 ON PT1.ProductID = P.ID AND PT1.TagID = 1
INNER JOIN ProductTag PT2 ON PT1.ProductID = P.ID AND PT1.TagID = 3
INNER JOIN ProductTag PT3 ON PT1.ProductID = P.ID AND PT1.TagID = 5

While this does select only the products that have those tags, it is not going to work with a dynamic list. In the past some people have built up the SQL and executed it dynamically, don't do that.

Instead, lets assume that the same tag can't be applied to a product twice, so we could change our question to: Find me products where the number of tags matching (dynamic list) is equal to the number of tags in (dynamic list)

DECLARE @selectedTags TABLE (ID int)
DECLARE @tagCount int

INSERT INTO @selectedTags VALUES (1)
INSERT INTO @selectedTags VALUES (3)
INSERT INTO @selectedTags VALUES (5)

SELECT @tagCount = COUNT(*) FROM @selectedTags

SELECT
    P.ID
FROM Product P
JOIN ProductTag PT
    ON PT.ProductID = P.ID
JOIN @selectedTags T
    ON T.ID = PT.TagID
GROUP BY
    P.ID,
    P.Name
HAVING COUNT(PT.TagID) = @tagCount

This returns just the ID of products that match all your tags, you could then join this back to the products table if you want more than just an ID, otherwise you're done.

As for your second query, once you have the product IDs that match, you want a list of all tags for those product IDs that aren't in your list:

SELECT DISTINCT
    PT2.TagID
FROM aProductTag PT2
WHERE PT2.ProductID IN (
    SELECT
        P.ID
    FROM aProduct P
    JOIN aProductTag PT
        ON PT.ProductID = P.ID
    JOIN @selectedTags T
        ON T.ID = PT.TagID
    GROUP BY
        P.ID,
        P.Name
    HAVING COUNT(PT.TagID) = @tagCount
)
AND PT2.TagID NOT IN (SELECT ID FROM @selectedTags)

Question 2

A RDBMS for being used for faceted searching is the wrong tool for the job at hand. Faceted searching is a multidimensional search, which is difficult to express in the set-based SQL language. Using a data-cube or the like might give you some of the desired functionality, but would be quite a bit of work to build.

When we were faced with similar requirements we ultimately decided to utilize the Apache Solr search engine, which supports faceting as well as many other search-oriented functions and features.

Question 3

It is possible to do faceted search in SQL Server. However don't try to use your live product data tables. Instead create a de-normalised "fact" table which holds every product (rows) and every tag (columns) so that the intersection is your product-tag values. You can re-populate this periodically from your main product table.

It is then straightforward and relatively efficient to get the facet counts for the matching records for each tag the user checks.

The approach I have described will be perfectly good for small cases, e.g. 1,000 product rows and 50-100 tags (attributes). Also there is an interesting opportunity with the forthcoming SQL Server 2014, which can place tables in memory - that should allow much larger fact tables.

I have also used Solr, and as STW points out this is the "correct" tool for facet searches. It is orders of magnitude faster than a SQL Server solution.

However there are some major disadvantages to using Solr. The main issue is that you have to setup not only another platform (Solr) but also all the paraphernalia that goes with it - Java and some kind of Java servlet (of which there are several). And whilst Solr runs on Windows quite nicely, you will still soon find yourself immersed in a world of command lines and editing of configuration files and environment variables that will remind you of all that was great about the 1980s ... or possibly not. And when that is all working you then need to export your product data to it, using various methods - there is a SQL Server connector which works fairly well but many prefer to post data to it as XML. And then you have to create a webservice-type process on your application to send it the user's query and parse the resulting list of matches and counts back into your application (again, XML is probably the best method).

So if your dataset is relatively small, I would stick with SQL Server. You can still get a sub-second response, and SQL 2014 will hopefully allow much bigger datasets. If your dataset is big then Solr will give remarkably fast results (it really is very fast) but be prepared to make a major investment in learning and supporting a whole new platform.