Classification accuracy based on top 3 most likely classifications

https://datascience.stackexchange.com//questions/63875

06-12-2019
|

Question

My goal is to recommend jobs to job seekers based on their skill set.

Currently I'm using an SVM for this, which is outputting one prediction, e.g. "software engineer at Microsoft". However, consider this: how significantly different are the skill sets of a software engineer at Microsoft and a software engineer at IBM? Probably not significantly different. Indeed, by inspection of my data set I can confirm this. Hence, the SVM struggles to discriminate in situations like this, of which there are many in my data set, and my classification accuracy is about 50%.

So I had an idea.

In SK Learn, once you've trained some model, you can compute the probability a particular input X belongs to each class.

So for each input X in my test set, I took the the top 3 most likely classifications. Then I tested whether or not the correct label was in the top 3 predictions. If it was, then I considered the prediction to be correct. In doing so, the classification accuracy increased to over 80%.

So my question is: is this a valid approach to measuring classification accuracy? If it is, then does it have a name?

In my mind, it is valid given my intended application, which is to recommend a selection of jobs to a job seeker, which are relevant to their skill set.

Cross posted from CS SE: https://cs.stackexchange.com/questions/117695/classification-accuracy-based-on-top-3-most-likely-classifications

I'm interested to know what perspective data scientists have on this.

Solution

Yes, it's common to consider that the prediction is made of multiple answers (typically top N most relevant answers) and use a performance measure based on that.

Currently you're treating the problem as a classification problem but logically this is more like a recommendation problem or an information retrieval problem (like results from a search engine). Usually for this kind of problem the gold answer would also consists of a list of several items, but apparently your dataset contains a single answer for every instance.

Answer to comment: a couple of papers using some top N performance measures (note: it's just a quick selection based on the keyword "information retrieval")

The CLEF series of Shared Tasks have proposed many datasets and evaluation measures across the years, it's probably a good source for resources and papers... if you have a bit of time to explore it ;)

OTHER TIPS

It is common and is called "top-k accuracy" or "top-n accuracy". You can find a description in these posts:

What is the definition of Top-n accuracy?

Evaluation & Calculate Top-N Accuracy: Top 1 and Top 5

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange