Question

My query is:

Given a user id, find the appropriate song recommendation for this user based on their ratings compared against other users' ratings.

I want everything to be real time here. So, as events come in, weight the recommendations appropriately and maintain a column family that supports a query like

SELECT recommendation_id FROM cf WHERE user_id=123 AND recommendation_type='song'

so, I was thinking something like a column family that stores all the ratings of a user (each song is a column), and then a set of recommendations. However, I can't come up with a way to make this work in real-time. I want a storm topology that populates the rating as well as the possible recommendations.

Another thing that seems messy about this is that it requires a lot of updating in cassandra. It would be better if it were only creating, right?

I've been trying to find examples of such a data model, and have yet to find one. Any resources others have found would be helpful.

Update: Another way to frame the problem, is that I'm trying to find a data structure that supports iterative collaborative filtering. Is this possible?

Was it helpful?

Solution

I've recently seen these slides from Spotify about using ML + Hadoop for Predictive analysis using matrix factorization. As you'll see in the slide 11 Cassandra is in the picture, but most of the results are precomputed every night.

OTHER TIPS

You might want to use the CQL collections including sets, maps and lists. Have a look at this blog post by the Datastax community :

http://www.datastax.com/dev/blog/cql3_collections

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top