문제

My query is:

Given a user id, find the appropriate song recommendation for this user based on their ratings compared against other users' ratings.

I want everything to be real time here. So, as events come in, weight the recommendations appropriately and maintain a column family that supports a query like

SELECT recommendation_id FROM cf WHERE user_id=123 AND recommendation_type='song'

so, I was thinking something like a column family that stores all the ratings of a user (each song is a column), and then a set of recommendations. However, I can't come up with a way to make this work in real-time. I want a storm topology that populates the rating as well as the possible recommendations.

Another thing that seems messy about this is that it requires a lot of updating in cassandra. It would be better if it were only creating, right?

I've been trying to find examples of such a data model, and have yet to find one. Any resources others have found would be helpful.

Update: Another way to frame the problem, is that I'm trying to find a data structure that supports iterative collaborative filtering. Is this possible?

도움이 되었습니까?

해결책

I've recently seen these slides from Spotify about using ML + Hadoop for Predictive analysis using matrix factorization. As you'll see in the slide 11 Cassandra is in the picture, but most of the results are precomputed every night.

다른 팁

You might want to use the CQL collections including sets, maps and lists. Have a look at this blog post by the Datastax community :

http://www.datastax.com/dev/blog/cql3_collections

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top