Question

I'm reading Collaborative Filtering for Implicit Feedback Datasets. On page 6 they detail their evaluation strategy, which they define as mean Expected Percentile Ranking with the following formula:

$$\overline{\text{rank}} = \frac{\sum_{u,i} r^t_{ui} \text{rank}_{ui}}{\sum_{u,i} r^t_{ui}}$$

This is the same formula that Datacamp defines as the appropriate error metric for implicit recommendation engines, except they call it "Rank Ordering Error Metric". I'm implementing the system in Spark, so I've defined a test dataset to try things out:

test_df = spark.createDataFrame(
  [
    ("A", "Fish", 1, 1),
    ("A", "Dogs", 2, 2),
    ("A", "Cats", 3, 3),
    ("A", "Elephants", 4, 4),
    ("B", "Fish", 1, 1),
    ("B", "Dogs", 2, 2),
    ("B", "Cats", 3, 3),
    ("B", "Elephants", 4, 4)
  ], ["Customer", "Item", "ImplicitRating", "PredictedRating"]
)

rankWindow = Window.partitionBy("Customer").orderBy(desc("PredictedRating"))
test_df\
  .withColumn("RankUI", percent_rank().over(rankWindow))\
  .withColumn("RankUIxRating", col("RankUI") * col("ImplicitRating"))\
  .show()

and the output is:

+--------+---------+--------------+---------------+------------------+------------------+
|Customer|     Item|ImplicitRating|PredictedRating|            RankUI|     RankUIxRating|
+--------+---------+--------------+---------------+------------------+------------------+
|       B|Elephants|             4|              4|               0.0|               0.0|
|       B|     Cats|             3|              3|0.3333333333333333|               1.0|
|       B|     Dogs|             2|              2|0.6666666666666666|1.3333333333333333|
|       B|     Fish|             1|              1|               1.0|               1.0|
|       A|Elephants|             4|              4|               0.0|               0.0|
|       A|     Cats|             3|              3|0.3333333333333333|               1.0|
|       A|     Dogs|             2|              2|0.6666666666666666|1.3333333333333333|
|       A|     Fish|             1|              1|               1.0|               1.0|
+--------+---------+--------------+---------------+------------------+------------------+

I'm effectively modelling a perfect prediction here by setting the Predicted "Rating" to match the ImplicitRating. My problem is that plugging those values into the formula above gives me...

$$\overline{\text{rank}} = \frac{\sum_{u,i} r^t_{ui} \text{rank}_{ui}}{\sum_{u,i} r^t_{ui}} = \frac{0.0+1.0+1.\dot{33}+1.0+0.0+1.0+1.\dot{33}+1.0}{4+3+2+1+4+3+2+1} = \frac{6.\dot{66}}{20} = 0.\dot{33}$$

Given the paper is explicit in saying that lower values of $\overline{\text{rank}}$ are better and that they achieved values as low as ~ 8%, I'm confused as to how that can be given my experience in this experiment.

What am I doing wrong?

Was it helpful?

Solution

I found a video called "Evaluating Implicit Ratings Models" by Datacamp. It is an explanation of evaluation of recommendation engines that use Implicit Ratings. Although they refer to the metric differently by calling it Rank Ordering Error Metric, the formula they give is identical to the one defined in the paper I linked in my question. Following along with the examples in the video and plugging in the values from their examples, I get the same result they do. I guess it seems slightly less than intuitive but I appear to be understanding how it works correctly anyway.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top