Question

I don't do a lot of SQL,and most of the time, I'm doing CRUD operations. Occasionally I'll get something a bit more complicated. So, this question may be a newbie question, but I'm ready. I've just been trying to figure this out for hours, and it's been no use.

So, Imagine the following table structure:

> | ID | Col1 | Col2 | Col3 | .. | Col8 |

I want to select ID and a calculated column. The calculated column has a range of 0 - 8 and it contains the number of matches to the query. I also want to restrict the result set to only include rows that have a certain number of matches.

So, from this sample data:

> | 1 | 'a' | 'b' | 1 | 2 |  
> | 2 | 'b' | 'c' | 1 | 2 |  
> | 3 | 'b' | 'c' | 4 | 5 |  
> | 4 | 'x' | 'x' | 9 | 9 |  

I want to query on Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5 where the calculated result > 1 and have the result set look like:

> | ID | Cal |
> | 1  |  2  |
> | 2  |  2  |
> | 3  |  2  |

I'm using T-SQL and SQL Server 2005, if it matters, and I can't change the DB Schema.

I'd also prefer to keep it as one self-contained query and not have to create a stored procedure or temporary table.

Was it helpful?

Solution

This answer will work with SQL 2005, using a CTE to clean up the derived table a little.

WITH Matches AS
(
    SELECT ID, CASE WHEN Col1 = 'a' THEN 1 ELSE 0 END + 
                CASE WHEN Col2 = 'c' THEN 1 ELSE 0 END +
                CASE WHEN Col3 = 1  THEN 1 ELSE 0 END +
                CASE WHEN Col4 = 5  THEN 1 ELSE 0 END AS Result
    FROM Table1
    WHERE Col1 = 'a' OR Col2 = 'c' OR Col3 = 1 OR Col4 = 5 
)
SELECT ID, Result
FROM Matches
WHERE Result > 1 

OTHER TIPS

Here's a solution that leverages the fact that a boolean comparison returns the integers 1 or 0:

SELECT * FROM (
  SELECT ID, (Col1='a') + (Col2='c') + (Col3=1) + (Col4=5) AS calculated
  FROM MyTable
) q
WHERE calculated > 1; 

Note that you have to parenthesize the boolean comparisons because + has higher precedence than =. Also, you have to put it all in a subquery because you normally can't use a column alias in a WHERE clause of the same query.

It might seem like you should also use a WHERE clause in the subquery to restrict its rows, but in all likelihood you're going to end up with a full table scan anyway so it's probably not a big win. On the other hand, if you expect that such a restriction would greatly reduce the number of rows in the subquery result, then it'd be worthwhile.


Re Quassnoi's comment, if you can't treat boolean expressions as integer values, there should be a way to map boolean conditions to integers, even if it's a bit verbose. For example:

SELECT * FROM (
  SELECT ID, 
      CASE WHEN Col1='a' THEN 1 ELSE 0 END
    + CASE WHEN Col2='c' THEN 1 ELSE 0 END 
    + CASE WHEN Col3=1   THEN 1 ELSE 0 END
    + CASE WHEN Col4=5   THEN 1 ELSE 0 END AS calculated
  FROM MyTable
) q
WHERE calculated > 1;

This query is more index friendly:

SELECT  id, SUM(match)
FROM    (
        SELECT  id, 1 AS match
        FROM    mytable
        WHERE   col1 = 'a'
        UNION ALL
        SELECT  id, 1 AS match
        FROM    mytable
        WHERE   col2 = 'c'
        UNION ALL
        SELECT  id, 1 AS match
        FROM    mytable
        WHERE   col3 = 1
        UNION ALL
        SELECT  id, 1 AS match
        FROM    mytable
        WHERE   col4 = 5
        ) q
GROUP BY
        id
HAVING  SUM(match) > 1

This will only be efficient if all the columns you are searching for are, first, indexed and, second, have high cardinality (many distinct values).

See this article in my blog for performance details:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top