what metrics to evaluate rank order results?

https://datascience.stackexchange.com/questions/67922

08-12-2020
|

Question

I have searched on stackexchange and found a couple of topics like this and this but they are not quite relevant to my problem (or at least I don't know how to make them relevant to my problem).

Anyway, say I have two sets of prediction results, as show by df1 and df2.

y_truth = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
y_predicted_rank1 = [6, 1, 7, 2, 8, 3, 9, 4, 10, 5]
y_predicted_rank2 = [4, 1, 7, 2, 8, 3, 9, 6, 10, 5]
df1 = pd.DataFrame({'tag': yy_truth, 'predicted_rank': y_predicted_rank1}).sort_values('predicted_rank')
df2 = pd.DataFrame({'tag': yy_truth, 'predicted_rank': y_predicted_rank2}).sort_values('predicted_rank')

print(df1)

#   tag predicted_rank
#1  1   1
#3  1   2
#5  1   3
#7  1   4
#9  1   5
#0  0   6
#2  0   7
#4  0   8
#6  0   9
#8  0   10


print(df2)
#   tag predicted_rank
#1  1   1
#3  1   2
#5  1   3
#0  0   4
#9  1   5
#7  1   6
#2  0   7
#4  0   8
#6  0   9
#8  0   10

By looking at them, I know df1 is better than df2, since in df2, a negative sample (zero) was predicted to have rank #4. So my question is, what metric can be used here so that I can (mathematically) tell df1 is better than df2?

Solution

For comparing two rankings Spearman's rank correlation is a good measure. It's probably worth a try, but since your gold truth appears to be binary I would think that top-N accuracy (or some variant of it) would be more appropriate (advantage: easy to interpret). You could also consider using the Area Under the Curve (AUC), using the predicted rank as variable threshold (less intuitive to but doesn't require choosing any top N).

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange