Collaborative or structured recommendation?

https://stackoverflow.com/questions/10117263

30-05-2021
|

Question

I am building a Rails app that recommends tutors to students and vise versa. I need to match them based on multiple dimensions, such as their majors (Math, Biology etc.), experience (junior etc.), class (Math 201 etc.), preference (self-described keywords) and ratings.

I checked out some Rails collaborative recommendation engines (recommendable, recommendify) and Mahout. It seems that collaborative recommendation is not the best choice in my case, since I have much more structured data, which allows a more structured query. For example, I can have a recommendation logic for a student like:

if student looks for a Math tutor in Math 201:
  if there's a tutor in Math major offering tutoring in Math 201 then return
  else if there's a tutor in Math major then sort by experience then return
  else if there's a tutor in quantitative major then sort by experience then return
  ...

My questions are:

What are the benefits of a collaborative recommendation algorithm given that my recommendation system will be preference-based?
If it does provide significant benefits, how I can combine it with a preference-based recommendation as mentioned above?
Since my approach will involve querying multiple tables, it might not be efficient. What should I do about this?

Thanks a lot.

Solution

It sounds like your measurement of compatibility could be profitably reformulated as a metric. What you should do is try to interpret your `columns' as being different components of the dimension of your data. The idea is that you ultimately should produce a binary function which returns a measurement of compatibility between students and tutors (and also students/students and tutors/tutors). The motivation for extending this metric to all types of data is that you can then use this idea to reformulate your matching criteria as a nearest-neighbor search:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

There are plenty of data structures and solutions to this problem as it has been very well studied. For example, you could try out the following library which is often used with point cloud data:

http://www.cs.umd.edu/~mount/ANN/

To optimize things a bit, you could also try prefiltering your data by running principal component analysis on your data set. This would let you reduce the dimension of the space in which you do nearest neighbor searches, and usually has the added benefit of reducing some amount of noise.

http://en.wikipedia.org/wiki/Principal_component_analysis

Good luck!

OTHER TIPS

Personally, I think collaborative filtering (cf) would work well for you. Note that a central idea of cf is serendipity. In other words, adding too many constraints might potentially result in a lukewarm recommendations to users. The whole point of cf is to provide exciting and relevant recommendations based on similar users. You need not impose such tight constraints.

If you might decide on implementing a custom cf algorithm, I would recommend reading this article published by Amazon [pdf], which discusses Amazon's recommendation system. Briefly, the algorithm they use is as follows:

for each item I1
    for each customer C who bought I1
        for each I2 bought by a customer
            record purchase C{I1, I2}
    for each item I2
        calculate sim(I1, I2) 
        //this could use your own similarity measure, e.g., cosine based
        //similarity, sim(A, B) = cos(A, B) = (A . B) / (|A| |B|) where A
        //and B are vectors(items, or courses in your case) and the dimensions
        //are customers
return table

Note that the creation of this table would be done offline. The online algorithm would be quick to return recommendations. Apparently, the recommendation quality is excellent.

In any case, if you want to get a better idea about cf in general (e.g., various cf strategies) and why it might be suited to you, go through that article (don't worry, it is very readable). Implementing a simple cf recommender is not difficult. Optimizations can be made later.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow