Question

I have a very specific recommendation problem.

Suppose I have 3 types of values/entities - item, property, value. There are N items, A properties and B values. Each item has some number of property-value pairs. Example:

Item#1
2374-23783
8455-5783
744-2438

Item#2
5435-23783
8455-54654
544-9778

...

Now, given an "anonymous" item, say, Item#x with 3-4 sample property value pairs like above, I want to get recommendations for a specific property. Example:

Item#x
5435-23783
544-9778
744-2438

8455-?? (get recommendation)

Now, intuition - the recommended value for property 8455 in Item#x may be 54654. You'll see that the properties 5435 and 744 have same values in Item#2 as in Item#x. Therefore, it's more probable that the value for 8455 will be similar to what value 8455 has in Item#2.

Question:

  1. What kind of model do you think would be best for this problem? What approach should I use? Collaborative filtering - but how? Simply dumping all property-value pairs into the dataset and fetching recommendations wouldn't satisfy my needs, obviously.

  2. Can you add any implementation specific details too? Mahout? Myrrix? Machine learning/recommendation libraries?

Was it helpful?

Solution

It doesn't appear you need any machine learning, just retrieval. The most straightforward way is to create a feature vector where each dimension is a property.

Vector position and property:

Position #0, property 2374
Position #1, property 8455
Position #2, property 744
Position #3, property 5435
Position #4, property 544

For each item fill in vector values.

Item #1 is represented as [23783,  5783, 2438,     ?,    ?]
Item #2 is represented as [    ?, 54654,    ?, 23783, 9778]
Item #x is represented as [    ?,     ?, 2438, 23783, 9778] 

Item #x has most common values with Item #2 whose position #1 is 54654. Basically you find the best intersection with an item that has the position value you're interested in. It gets more interesting if you want values for several properties that can only be suggested by several items, but you haven't talked about the nature of the data.

OTHER TIPS

Any machine learning approach will do the job. You can, for instance, use Bayesian networks as it is natural for these conditional item-property-value occurrences.

It is not realistic to add implementation specific details without knowing what is your concerns. What do you care most? Performance, accuracy, or scalability?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top