Myrrix tagging API to represent/weight parent/child Item relationship

https://stackoverflow.com/questions/18964305

29-06-2022
|

Вопрос

I've been using the Tagging API to tag my items in order to allow Item-Item 'similarity' scores to be calculated, so: Item 1 gets tagged with {UK, MALE, 50}, Item 2 with {FRANCE, MALE, 22}, that kind of thing. That's been working fine.

What I'd like to do is represent item-item 'relationships', so if my application says that 1 is a parent of 2 (and just to make things a little more complex, this is multi-level), I'd like to be able to tell Myrrix to pull those two items a little closer together.

My first solution was to add a 'PARENT_[name]' tag to each Item and, for each parent it has, add a 'PARENT_[parentname]' tag, with a lower weight as we go up the hierarchy. That did succeed in pulling parents and children closer.

Unfortunately the overall quality of suggestions seemed to fall a little, and the results seemed increasingly variable, e.g. run the import again, results seem completely random. Is this something that can be fixed at the features / lambda level?

I'm still not really all that clear what 'features' represents, but my suspicion is that by massively increasing the number of possible tags, I need to configure the model very differently...

Решение

That's the right way to think about it. It's overloading the API a fair bit, but still principled.

It may or may not actually help the results. It kind of depends on whether users who like A will also like B because they have a common product family. Maybe for music; unlikely for things you buy once like a toaster.

Variability comes from the random starting point. You will get different models each time. If the difference is significant when you start from scratch, then you are likely getting into over-fitting. It may be that your # of features is too high or lambda too low for the data set.

You should also run an eval to see whether the scores are good at all. If it's scoring poorly, yeah it's a case of parameters that are well off their best values.

The idea is that you need not build a new model from scratch every time though.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow