Question

I'm trying to create a data extractor algoritm from group buying sites to build an agregator for deals. First I need an algorith that will extract title,price,discount,image,coordinates.

I have solution for image,discount and coordinates but for title and category recognition I need to create an naive bayes algorithm. What is best language to do this: php? python? js? node.js?

What do I need to create an algorithm?

An model with examples? etc. I give 100 titles and then give all web content from some sites and do script can recognize what sentence is a title?

So I dont need a word. I need an sentence and that sentence is sometimes <h1> - <h2> and somethings other.

Was it helpful?

Solution

I seriously cannot understand much of your post, but since naive bayes is something very commonly requested here on SO, I created a simple piece of code which can be used without any additional library (like NLTK) in python (and is also way faster than NLTK for training). You can find it here.

OTHER TIPS

If you don't have any experience with AI algorithms and you want to add some algoritm that can learn I suggest you should use google prediction API:

https://developers.google.com/prediction/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top