質問

I have a machine learning problem. I am given a long list of domains and I have to figure out which are ecommerce websites and which are personal websites. It is kind of a difficult problem because I do not have any training data to work with. I have come up with a couple ideas:

  1. Go through a couple hundred of these websites manually to tell if they are business or personal and develop a training set this way (Long and boring!).

  2. Crawl these websites and search for some keywords eg. "Buy Now", "Price", "Credit Card". etc.

Does anybody have any other approaches?

Thanks

役に立ちましたか?

解決

You could adaptively modify your keyword sets: As you crawl around, a word that correlates highly with existing keywords can be added to the list. Peter p.s. I would add this as a comment but I don't have enough reputation points...

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top