Pergunta

I have developed a content-based recommendation system and it is working fine. The input is a set of documents={d1,d2,d3,...,dn} and the output will be Top N similar documents for a given document output={d10,d11,d1,d8,...}. I eyeballed the results and found it to be satisfactory, the question I have is how do I measure the performance, accuracy of the system.

I did some research and found that recall, precision, and F1-score are used to evaluating the recommendation systems that predict user ratings. For this, we should no the original ratings and then the system should predict the ratings later we can plot the confusion matrix and then compute the aforementioned metric. However, in my case, I don't predict anything instead I measure the cosine similarity score sort it in descending order and pick the top N.

In this use case, how do I evaluate the system?

Thanks

Foi útil?

Solução

There's some confusion about different kinds of output and their corresponding evaluation:

  • One can consider the top N results as predicted positive, any result lower than N as predicted negative. In this option one can use binary classification evaluation measure: precision, recall, f1-score would be the standard measures in this case.
  • One can consider the ratings/scores assigned to the full set of results. In this case there are two options:
    • if the numerical results are comparable, e.g. same kind of rating, then standard regression evaluation measures can be used, for instance RMSE.
    • if not, then it's still possible to compare the order of the results. Spearman rank correlation is a common evaluation measure in this case.

It seems that in your case you could use either the classification or the ranking evaluation measures. Of course, any of these evaluation methods requires gold standard results in order to compare the predictions against them.

Licenciado em: CC-BY-SA com atribuição
scroll top