Question

I've been struggling lately with a complex SQL query.
I have the following tables:

[dbo].[User] ~ {ID,nickname}  
[dbo].[Property] ~ {ID,title}  
[dbo].[Property_Values] ~ [ID,propertyID,title}  
[dbo].[Property_Values_User_Linkage] ~ {UserID,PropertyID,valueID}  

It's basically a project in which a user chooses values for each property. Each property can be single-value or multi-value. For instance, user might choose multiple values for Property {ID=1,title=Hobbies} but must choose a single value for Property {ID=2,title=HairColor}.

Using another table - [dbo].[Search_Property_Values_User_Linkage] - {UserID,PropertyID,valueID} I'm choosing what properties I want and I expect to find matching users. BUT, If I haven't chosen a value (or multi-value) for, let's say HairColor, I should get all users (since I don't want to filter by HairColor).

So far it's easy, but the problem I can't seems to solve is the case where there are multi-values ahead or no user defined values. For example, I want all users with HairColor=Brown and Hobbies IN(basketball,football).

I can retrieve all users who match one of the terms (and those who have other properties, since I haven't chosen to filter them) but I can't get only the users who absolutely match my criteria.

To put code into words, let's just say I need all users who:

  • Match ALL property-value that I've chosen
  • Might have other properties, such as EyesColor, but since I haven't chosen a filtering value, they might be retrieved as well.
  • Might not have any property set at all, but since I haven't chosen a value for this property, they are VALID!
  • Match all selected properties as group, not just one property that I've chosen (users who likes basketball but have "red" for their HairColor are INVALID!

I came across with a solution in which I create a virtual table which "completes" non-chosen values by bits. For instance (not actual code):

DECLARE @isMatches bit  
SET @isMatches=0    
if [propertyIsChosen]=1  
{    
 if [userInProperty]=1 SET @isMatches=1   
}  
else SET isMatches=1

I basically CROSS-JOIN [Property] WITH [User]
and LEFT-OUTER-JOIN the rest of the tables to match selections.
I get all users and their match to a property. This is not good enough since I get users who have Brown hair, those who like basketball/football, but not those who matches to both (and of course any other undefined property by me).
This is heavy, but it's the what I got so far to analyze the problem.

I would appreciate any help. I think I'm missing something from my math classes, 10 years ago...

EDIT: db pic: http://i51.tinypic.com/2n1cfwg.png

Was it helpful?

Solution

Data Model

From you comment, I see that you have some other property-related tables that you have not mentioned in your question. I'm gonna assume we can ignore these for now and just concentrate on representing property values, which can be done by a simplified model below:

enter image description here

DDL SQL (MS SQL Server):

CREATE TABLE [USER] (
    USER_ID int NOT NULL,
    NICKNAME nvarchar(50) NOT NULL,
    CONSTRAINT PK_USER PRIMARY KEY CLUSTERED (USER_ID)
);

CREATE TABLE USER_PROPERTY_VALUE(
    USER_ID int NOT NULL,
    PROPERTY_NAME nvarchar(50) NOT NULL,
    PROPERTY_VALUE_NO int NOT NULL,
    PROPERTY_VALUE nvarchar(255) NOT NULL,
    CONSTRAINT PK_USER_PROPERTY_VALUE PRIMARY KEY CLUSTERED (
        USER_ID,
        PROPERTY_NAME,
        PROPERTY_VALUE_NO
    )
);

ALTER TABLE USER_PROPERTY_VALUE ADD CONSTRAINT FK_USER_PROPERTY_VALUE_USER FOREIGN KEY(USER_ID)
REFERENCES [USER] (USER_ID);

A multi-value is represented by several rows in USER_PROPERTY_VALUE, sharing the same PROPERTY_NAME but each having a distinct PROPERTY_VALUE_NO.

The following example data...

Rob:   HairColor={brown}, Hobby={basketball,football}
Bob:   HairColor={brown}, Hobby={basketball}
Steve: Hobby={basketball,football}

...is represented in the database like this:

USER:

USER_ID NICKNAME
1       Rob
2       Bob
3       Steve

USER_PROPERTY_VALUE:

USER_ID PROPERTY_NAME   PROPERTY_VALUE_NO   PROPERTY_VALUE
1       HairColor       1                   brown
1       Hobby           1                   basketball
1       Hobby           2                   football
2       HairColor       1                   brown
2       Hobby           1                   basketball
3       Hobby           1                   basketball
3       Hobby           2                   football

Example Queries

Select users with brown hair color:

SELECT *
FROM [USER]
WHERE
    EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'HairColor'
            AND PROPERTY_VALUE = 'brown'
    )

Result:

USER_ID NICKNAME
1       Rob
2       Bob

Select users whose hobbies include both basketball and football:

SELECT *
FROM [USER]
WHERE
    EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'Hobby'
            AND PROPERTY_VALUE = 'basketball'
    )
    AND EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'Hobby'
            AND PROPERTY_VALUE = 'football'
    )

Result:

USER_ID NICKNAME
1       Rob
3       Steve

Select users whose hair color is brown and hobbies include both basketball and football.

SELECT *
FROM [USER]
WHERE
    EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'HairColor'
            AND PROPERTY_VALUE = 'brown'
    )
    AND EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'Hobby'
            AND PROPERTY_VALUE = 'basketball'
    )
    AND EXISTS (
        SELECT *
        FROM USER_PROPERTY_VALUE 
        WHERE
            USER_PROPERTY_VALUE.USER_ID = [USER].USER_ID
            AND PROPERTY_NAME = 'Hobby'
            AND PROPERTY_VALUE = 'football'
    )

Result:

USER_ID NICKNAME
1       Rob

Etc, etc...

Querying Based on the Contents of Another Table

Let's say you have a table that contains the filter criteria:

CREATE TABLE PROPERTY_FILTER (
    PROPERTY_NAME nvarchar(50) NOT NULL,
    PROPERTY_VALUE nvarchar(255) NOT NULL,
    CONSTRAINT PK_PROPERTY_FILTER PRIMARY KEY (PROPERTY_NAME, PROPERTY_VALUE)
)

The following query will return only users that fulfill the criteria currently contained in that table:

SELECT *
FROM [USER] U
WHERE
    NOT EXISTS (
        SELECT F.PROPERTY_NAME, F.PROPERTY_VALUE
        FROM PROPERTY_FILTER F
        EXCEPT
        SELECT P.PROPERTY_NAME, P.PROPERTY_VALUE
        FROM USER_PROPERTY_VALUE P
        WHERE P.USER_ID = U.USER_ID
    )

In plain English: if there is a filter property that is not also users's property, ignore that user.

BTW, this will not work in concurrent multi-user environment - you'll need to introduce an additional field in the PROPERTY_FILTER table to identify the "session", or use temporary tables (if you don't require persistence).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top