How much data do I need for recommender system?

https://stackoverflow.com/questions/18198197

24-06-2022
|

Question

I have to develop a personality/job suitability online test for an HR department. Basically, users will answer questions, on a scale of 0-10 for example, and after say 50 questions, I want to translate that to a rating in 5 different personality/ job suitability characteristics.

I don't have any real data to start with, so first, is it even worth it to use a recommendation engine like MyMediaLite (github). How many samples will I need to train it to a decent performance?

I previously built a training course recommender, by simply doing and hand-weighted sum where each question increased the weight of several courses that were related to that question. It was an expert system, built like a feed-forward neural network, where I personally tuned all the weights based on my knowledge of the questions and the courses' content.

I would like to this time around use a recommender system, but I'm wondering how many times I would have to take the 50 question test, and then assign the results manually. would 100 examples do? that could be possible. 1000 would be too long. How can I know ahead of time?

Solution

Though useless, I want to say this is not possible to give a definite number. You should focus on learning curve when adding new samples.

You can process the samples by hand and engine on parallel, and compare the result given by both. Once the measurement e.g. recall and precision of the result given by engine achieve your expectation, then you get enough samples.

Hope this helpful!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow