I've recently seen these slides from Spotify about using ML + Hadoop for Predictive analysis using matrix factorization. As you'll see in the slide 11 Cassandra is in the picture, but most of the results are precomputed every night.
Real-time recommendation engine data models in cassandra
-
23-06-2022 - |
Domanda
My query is:
Given a user id, find the appropriate song recommendation for this user based on their ratings compared against other users' ratings.
I want everything to be real time here. So, as events come in, weight the recommendations appropriately and maintain a column family that supports a query like
SELECT recommendation_id FROM cf WHERE user_id=123 AND recommendation_type='song'
so, I was thinking something like a column family that stores all the ratings of a user (each song is a column), and then a set of recommendations. However, I can't come up with a way to make this work in real-time. I want a storm topology that populates the rating as well as the possible recommendations.
Another thing that seems messy about this is that it requires a lot of updating in cassandra. It would be better if it were only creating, right?
I've been trying to find examples of such a data model, and have yet to find one. Any resources others have found would be helpful.
Update: Another way to frame the problem, is that I'm trying to find a data structure that supports iterative collaborative filtering. Is this possible?
Soluzione
Altri suggerimenti
You might want to use the CQL collections including sets, maps and lists. Have a look at this blog post by the Datastax community :