How to evaluate a suggestion system with relevant order?

https://stackoverflow.com/questions/22956379

30-06-2023
|

Question

I'm working on a suggestion system. For a given input, the system outputs N suggestions.

We have collected data about what suggestions the users like. Example:

input1 - output11 output12 output13
input2 - output21
input3 - output31 output32
...

We now want to evaluate our system based on this data. The first metric is if these outputs are present in the suggestions of our system, that's easy.

But now, we would like to test how well positioned are these outputs in the suggestions. We would like to have the given outputs close to the first suggestions.

We would like a single score for the system or for each input.

Based on the previous data, here is what a score of 100% would be:

input1 - output11 output12 output13 other other other ...
input2 - output21 other    other    other other other ... 
input3 - output31 output32 other    other other other ...
...

(The order of output11 output12 output13 is not relevant. What is important is that ideally the three of them should be in the first three suggestions).

We could give a score to each position that is hold by a suggestion or count the displacement from the ideal position, but I don't see a good way to do this.

Is there an existing measure that could be used for that ?

Solution

You want something called the mean average precision (it's a metric from information retrieval).

Essentially, for each of the 'real' data points in your output list, you can compute the precision (#of correct entries above that point / #entries above that point). If you average this number across the positions of each of your real data points in the output list, you get a metric that does what you want.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow