Question

Is there a way to select rows where one of the column contains only , but any number of, predefined values?

I've been using this, but it returns any rows where my column contains at least one of the values (which is exactly what it's suppose to do, I know).

But I'm looking for a way to only select rows that have ONLY my keywords in the keyword column.

SELECT * 
FROM 
    `products`.`product` 
WHERE 
    keywords LIKE '%chocolate%' 
AND keyword LIKE '%vanilla%';

Example Keywords: chocolate, sugar, milk, oats

Using the above keywords, I would want the first two results returned, but not the last two:

Product1: chocolate, sugar 

Product2: chocolate 

Product3: chocolate, sugar, milk, oats, bran 

Product4: chocolate, sugar, salt

My column contains a comma separated list of all keywords applicable to that product row.

Was it helpful?

Solution

Since you are storing the list as a string containing a comma separated list, rather than as a set, MySQL isn't going to be able to help much with that. When it was inserted into the database, MySQL saw it as a single string. When it's retrieved from the database, MySQL sees it as a single string. When we refer to it in a query, MySQL sees it as a single string.


If the "list" was stored as a standard relational set, with each keyword for a product stored as a separate row in the table, then returning the result set you specified is almost trivial.

For example, if we had this table:

CREATE TABLE product_keyword 
product_id      BIGINT UNSIGNED COMMENT 'FK ref products.id'
keyword         VARCHAR(20)

With each keyword associated to a particular product as a separate row:

product_id keyword
---------- ---------
         1 chocolate
         1 sugar
         2 chocolate
         3 bran
         3 chocolate
         3 milk
         3 oats
         3 sugar
         4 chocolate
         4 salt
         4 sugar

Then to find all rows in product that have a keyword other than 'chocolate' or 'vanilla'

SELECT p.id
  FROM product p
  JOIN product_keyword k
 WHERE k.product_id = p.id
    ON k.keyword NOT IN ('chocolate','vanilla')
 GROUP BY p.id

--or--

SELECT p.id
  FROM product p
  LEFT
  JOIN ( SELECT j.id
           FROM product_keyword j
          WHERE j.keyword NOT IN ('chocolate','vanilla')
         GROUP BY j.id
       ) k
    ON k.id = p.id 
 WHERE k.id IS NULL

To get products that have at least one of the keywords 'chocolate' and 'vanilla', but that have no other keywords associated, it's the same query above, but with an additional join:

SELECT p.id
  FROM product p
  JOIN ( SELECT g.id
           FROM product_keyword g
          WHERE g.keyword IN ('chocolate','vanilla')
         GROUP BY g.id
       ) h
    ON h.id = p.id 
  LEFT
  JOIN ( SELECT j.id
           FROM product_keyword j
          WHERE j.keyword NOT IN ('chocolate','vanilla')
         GROUP BY j.id
       ) k
    ON k.id = p.id 
 WHERE k.id IS NULL

We can unpack those queries, they aren't difficult. Query h returns a list of product_id that have at least one of the keywords, query k returns a list of product_id that have some keyword other than those specified. The "trick" there (if you want to call it that) is the anti-join pattern... doing an outer join to match rows, and include rows that didn't have a match, and a predicate in the WHERE clause that eliminates rows that had a match, leaving the set of rows from product that didn't have a match.


But with the set stored as a "comma separated list" in a single character column, we lose all the advantages of relational algebra; there isn't any easy way to process the list of keywords as a "set".

With the entire list stored as a single string, we've got some horrendous SQL to get the specified result.

One approach to doing the kind of check you specify would be to create a set of all possible "matches", and check those. This is workable for a couple of keywords. For example, to get a list of products that have ONLY the keywords 'vanilla' and/or 'chocolate', (that is, that have at least one of those keywords and does not have any other keyword):

SELECT p.id
  FROM product 
 WHERE keyword_list = 'chocolate'
    OR keyword_list = 'vanilla'
    OR keyword_list = 'chocolate,vanilla'
    OR keyword_list = 'vanilla,chocolate'

But extending that to three, four or five keywords quickly becomes unwieldy (unless the keywords are guaranteed to appear in a particular order. And it's very difficult to check for three out of four keywords.

Another (ugly) approach is to transform the keyword_list into a set, so that we can use queries like the first ones in my answer. But the SQL to do the transformation is limited by an arbitrary maximum number of keywords that can be extracted from the keyword_list.

It's fairly easy to extract the nth element from a comma separated list, using some simple SQL string functions, for example, to extract the first five elements from a comma separated list:

SET @l := 'chocolate,sugar,bran,oats'
SELECT NULLIF(SUBSTRING_INDEX(CONCAT(@l,','),',',1),'')                         AS kw1
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',2),',',-1),'') AS kw2
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',3),',',-1),'') AS kw3
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',4),',',-1),'') AS kw4
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',5),',',-1),'') AS kw5

But those are still on the same row. If we want to do checks on those, we'd have a bit of comparing to do, we'd need to check each one of those to see if it was in the specified list.

If we can get those keywords, on the one row, transformed into a set of rows with one keyword on each row, then we could use queries like the first ones in my answer. As an example:

SELECT t.product_id
     , NULLIF(CASE n.i
       WHEN 1 THEN SUBSTRING_INDEX(CONCAT(t.l,','),',',1)
       WHEN 2 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',2),',',-1)
       WHEN 3 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',3),',',-1)
       WHEN 4 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',4),',',-1)
       WHEN 5 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',5),',',-1)
       END,'') AS kw
  FROM ( SELECT 4 AS product_id,'fee,fi,fo,fum' AS l  
          UNION ALL 
         SELECT 5, 'coffee,sugar,milk'
        ) t
 CROSS
  JOIN ( SELECT 1 AS i
         UNION ALL SELECT 2
         UNION ALL SELECT 3
         UNION ALL SELECT 4
         UNION ALL SELECT 5
       ) n
HAVING kw IS NOT NULL
ORDER BY t.product_id, n.i

That gets us individual rows, but it's limited to a row for each of the first 5 keywords. It's easy to see how that would be extended (having n return 6,7,8,...) and extending the WHEN conditions in the CASE to handle 6,7,8...

But there is going to be some arbitrary limit. (I've used an inline view, aliased as t, to return two "example" rows, as a demonstration. That inline view could be replaced with a reference to the table containing the product_id and keyword_list columns.)

So, that query gets us a rowset like would be returned from the product_keyword table I gave as an example above.

In the example queries, references to the product_keyword table could be replaced with this query. But that's a whole lot of ugly SQL, and its horrendously inefficient, creating and populating temporary MyISAM tables anytime a query is run.

OTHER TIPS

You probably want to setup a fulltext index on keywords for your table. This allows you to search the keywords column and specify what keywords to include or not include. Here's a command which sets up the index:

ALTER TABLE products ADD FULLTEXT index_products_keywords (keywords);

Once you've done that, you can use the MATCH AGAINST phrase and specify keywords. You can use it like WHERE MATCH(keywords) AGAINST ('chocolate') to just search for the term chocolate. Or you can use BOOLEAN MODE to "turn-off" certain keywords.

SELECT * FROM products 
WHERE MATCH(keywords) AGAINST ('+chocolate -bran' IN BOOLEAN MODE);

Here's a small tutorial about fulltext indexes

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top