Question

I'm trying to find users with similar set of interests, with the following schema..

USERS - ID name etc

Interests - ID UID PID

where ID is unique ID for Interests, UIS is user ID and PID is a product ID. I have looked at other similar questions at SO, but none of them had an exact answer.

Example- Let's say I'm interested in getting users with similar interest to John, and this is how to two tables look like ...

ID  Name
11  John
12  Mary
13  Scott
14  Tim

ID UID PID
3  12  123
4  12  231
5  12  612
6  13  123
7  13  612
8  14  931
9  14  214
10 11  123
11 11  231
12 11  781
13 11  612

I would like a result with in that order.

I was thinking of doing a set intersection of the user I'm interested in with all other users. It doesn't sound like a very good solution, because it will have to be done everytime a user adds interest or another user is added. Its a small project, and as of now I'll be limiting users to 100. I still think that the above approach will not be efficient at all as it will take 1002 time.

Can someone guide me in the right direction? What are the possible solutions, and which one will be the best with above given constraints. I'm looking at ANN to see if I can use that.

Was it helpful?

Solution

This starts by counting the number of interests that each user has in common with John. The approach is to take all of John's interests, join back to the interests table and aggregate to the the count of common interests. Here is the SQL for that:

select i.uid, COUNT(*) as cnt
from (select i.*
      from interests i join
           users u
           on i.uid = i.id
      where u.name = 'John'
     ) ilist join
     interests i
     on ilist.pid = i.pid and
        ilist.uid <> i.uid  -- forget about John
group by i.uid

But, you actually want the list of products, rather than just the count. So, you have to join back to the interests table:

select i.*
from (select i.uid, COUNT(*) as cnt
      from (select i.*
            from interests i join
                 users u
                 on i.uid = i.id
            where u.name = 'John'
           ) ilist join
           interests i
           on ilist.pid = i.pid and
              ilist.uid <> i.uid  -- forget about John
      group by i.uid
     ) t join
     interests i
     on t.uid = i.uid
group by t.cnt, i.uid 

OTHER TIPS

The following query finds others users with atleast 2 or more similar interests according to the interests of user 11.

SELECT in2.UID FROM users u
INNER JOIN interest in1 ON (in1.UID = u.ID)
INNER JOIN interest in2 ON (in2.PID = in1.PID AND in2.UID <> u.ID)
WHERE u.ID = 11
GROUP BY in2.UID
HAVING COUNT(in2.UID) >= 2
ORDER BY COUNT(in2.UID) DESC

The ORDER BY ensures that users with the most similar interests ends up first. The HAVING COUNT(in2.UID) >= 2) makes sure the users which are found have atleast 2 or more similar interest.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top