Question

I have a database of articles which will be indexed by Lucene, classified by users' tag and mahout. Articles will have a certain score to tag (user can agree or disagree with a tag, tags discovered by mahout will be treated the same as user's).

I want to find out user's interest (maybe interest to a certain tag) from profile and interaction history.

How can I store users' interest?
And how can I use users' interest to sort or filter to search result?

Is my approach possible? Feasible? Scalable?
What kind of techniques and algorithms I can use? Please suggest!

Was it helpful?

Solution

This sounds mostly like a search problem, not a recommendation problem. You are primarily sorting and filtering search results, based on tags. As such I think Lucene is generally the tool to deploy, not Mahout. (Although using Mahout classifiers to learn tags is quite right.)

If you really want to imagine this as a recommender problem, I might say your items are the tags. Any time you interact with a tag, like view a page tagged X, Y and Z, then that indicates you are a little more interested in "items" X, Y and Z. And then the recommender problem here is to suggest new tags of interest.

You could try using a simple count of interactions with a tag as a numeric "rating", though I think that won't give great results in a recommender context. Using the log of count is better, but still feels wrong. You could ignore interaction count and just use the fact that the user and tag have ever interacted, or not -- "boolean preferences".

The recommender algorithm model that best matches this input, that I know of, is the alternating least squares model you see in ParallelALSFactorizationJob. I don't know if that's usable to you but that's the algorithm I would investigate if you have the time and inclination. Its input is more like an "interaction strength" not a rating, and it treats it that way, and that's what you have here.

OTHER TIPS

A lot of times it's easier to just make the user explicitly say what tags they're interested in. This is what stack overflow does, for example. You can boost the score by some amount if the tag is one they're interested in.

If you want to do something more implicit, Mahout has an FAQ on recommendation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top