Вопрос

I have a machine learning problem. I am given a long list of domains and I have to figure out which are ecommerce websites and which are personal websites. It is kind of a difficult problem because I do not have any training data to work with. I have come up with a couple ideas:

  1. Go through a couple hundred of these websites manually to tell if they are business or personal and develop a training set this way (Long and boring!).

  2. Crawl these websites and search for some keywords eg. "Buy Now", "Price", "Credit Card". etc.

Does anybody have any other approaches?

Thanks

Это было полезно?

Решение

You could adaptively modify your keyword sets: As you crawl around, a word that correlates highly with existing keywords can be added to the list. Peter p.s. I would add this as a comment but I don't have enough reputation points...

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top