Question

We're trying to implement a semantic searching algorithm to give suggested categories based on a user's search terms.

At the moment we have implemented the Naive Bayes probabilistic algorithm to return the probabilities of each category in our data and then return the highest one.

However, due to its naivety it sometimes gets the results wrong.

Without going into Neural Networks and other ridiculously complex stuff is there another alternative that we can look into?

Was it helpful?

Solution

Naive Bayes (NB) is not much different than Logistic Regression. From experience, Logistic Regression outperforms NB in terms of predictive performance most of the time.

Also, if you have enough data and do not have any missing data, then you will most likely find that the predictive performance of NB is pretty much the same as the more complicated methodologies, such as Bayesian Networks (BNs), which do not have the 'naive' independence assumption between covariates.

If you want to relax the independence assumption without having to dive fully into the realm of BNs, you can try the Tree Augmented Naive Bayes algorithm first.

OTHER TIPS

If you don't consider linear SVM to be ridiculously complex stuff, you could try that. It's known to perform very well for such tasks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top