سؤال

I have an application which I will be accessing SQL server to return data which has been filtered by selections from the application as any common faceted search. I did see some out the box solutions, but these are expensive and I prefer building out something custom, but just don't know where to start.

The database structure is like this: enter image description here

The data from the PRODUCT table would be searched by tags from the TAG table. Values which would be found in the TAG table would be something like this:

 ID      NAME
 ----------------------
 1       Blue
 2       Green
 3       Small
 4       Large
 5       Red

They would be related to products through the ProductTag table.

I would need to return two groups of data from this setup:

  1. The Products that are only related to the Tags selected, whether single or multiple
  2. The Remaining tags that are also available to select for the products which have already been refined by single or multiple selected tags.

I would like this to be all with-in SQL server if possible, 2 seperate as stored procedures.

Most websites have this feature built into it these days, ie: http://www.gnc.com/family/index.jsp?categoryId=2108294&cp=3593186.3593187 (They've called it 'Narrow By')

I have been searching for a while how to do this, and I'm taking a wild guess that if a stored procedure has to be created in this nature, that there would need to be 1 param that accepts CSV values, like this:

 [dbo].[GetFacetedProducts] @Tags_Selected = '1,3,5'
 [dbo].[GetFacetedTags] @Tags_Selected = '1,3,5'

So with this architecture, does anyone know what types of queries need to be written for these stored procedures, or is the architecture flawed in any way? Has anyone created a faceted search before that was like this? If so, what types of queries would be needed to make something like this? I guess I'm just having trouble wrap my head around it, and there isn't much out there that shows someone how to make something like this.

هل كانت مفيدة؟

المحلول 3

There's other places where you can get examples of turning a CSV parameter into a table variable. Assuming you have done that part your query boils down to the following:

GetFacetedProducts: Find Product records where all tags passed in are assigned to each product.

If you wrote it by hand you could end up with:

SELECT P.*
FROM Product P
INNER JOIN ProductTag PT1 ON PT1.ProductID = P.ID AND PT1.TagID = 1
INNER JOIN ProductTag PT2 ON PT1.ProductID = P.ID AND PT1.TagID = 3
INNER JOIN ProductTag PT3 ON PT1.ProductID = P.ID AND PT1.TagID = 5

While this does select only the products that have those tags, it is not going to work with a dynamic list. In the past some people have built up the SQL and executed it dynamically, don't do that.

Instead, lets assume that the same tag can't be applied to a product twice, so we could change our question to: Find me products where the number of tags matching (dynamic list) is equal to the number of tags in (dynamic list)

DECLARE @selectedTags TABLE (ID int)
DECLARE @tagCount int

INSERT INTO @selectedTags VALUES (1)
INSERT INTO @selectedTags VALUES (3)
INSERT INTO @selectedTags VALUES (5)

SELECT @tagCount = COUNT(*) FROM @selectedTags

SELECT
    P.ID
FROM Product P
JOIN ProductTag PT
    ON PT.ProductID = P.ID
JOIN @selectedTags T
    ON T.ID = PT.TagID
GROUP BY
    P.ID,
    P.Name
HAVING COUNT(PT.TagID) = @tagCount

This returns just the ID of products that match all your tags, you could then join this back to the products table if you want more than just an ID, otherwise you're done.

As for your second query, once you have the product IDs that match, you want a list of all tags for those product IDs that aren't in your list:

SELECT DISTINCT
    PT2.TagID
FROM aProductTag PT2
WHERE PT2.ProductID IN (
    SELECT
        P.ID
    FROM aProduct P
    JOIN aProductTag PT
        ON PT.ProductID = P.ID
    JOIN @selectedTags T
        ON T.ID = PT.TagID
    GROUP BY
        P.ID,
        P.Name
    HAVING COUNT(PT.TagID) = @tagCount
)
AND PT2.TagID NOT IN (SELECT ID FROM @selectedTags)

نصائح أخرى

A RDBMS for being used for faceted searching is the wrong tool for the job at hand. Faceted searching is a multidimensional search, which is difficult to express in the set-based SQL language. Using a data-cube or the like might give you some of the desired functionality, but would be quite a bit of work to build.

When we were faced with similar requirements we ultimately decided to utilize the Apache Solr search engine, which supports faceting as well as many other search-oriented functions and features.

It is possible to do faceted search in SQL Server. However don't try to use your live product data tables. Instead create a de-normalised "fact" table which holds every product (rows) and every tag (columns) so that the intersection is your product-tag values. You can re-populate this periodically from your main product table.

It is then straightforward and relatively efficient to get the facet counts for the matching records for each tag the user checks.

The approach I have described will be perfectly good for small cases, e.g. 1,000 product rows and 50-100 tags (attributes). Also there is an interesting opportunity with the forthcoming SQL Server 2014, which can place tables in memory - that should allow much larger fact tables.

I have also used Solr, and as STW points out this is the "correct" tool for facet searches. It is orders of magnitude faster than a SQL Server solution.

However there are some major disadvantages to using Solr. The main issue is that you have to setup not only another platform (Solr) but also all the paraphernalia that goes with it - Java and some kind of Java servlet (of which there are several). And whilst Solr runs on Windows quite nicely, you will still soon find yourself immersed in a world of command lines and editing of configuration files and environment variables that will remind you of all that was great about the 1980s ... or possibly not. And when that is all working you then need to export your product data to it, using various methods - there is a SQL Server connector which works fairly well but many prefer to post data to it as XML. And then you have to create a webservice-type process on your application to send it the user's query and parse the resulting list of matches and counts back into your application (again, XML is probably the best method).

So if your dataset is relatively small, I would stick with SQL Server. You can still get a sub-second response, and SQL 2014 will hopefully allow much bigger datasets. If your dataset is big then Solr will give remarkably fast results (it really is very fast) but be prepared to make a major investment in learning and supporting a whole new platform.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top