質問

Hi I have a conceptual question on a system I'm trying to develop that tries to classify emails. I have a large set (>100k) messages that are not spam and a large set of unclassified messages. Is it then possible to use a method (perhaps Bayesian) to detect spam without having a data set of spam? Do I absolutely need to classify spam?

役に立ちましたか?

解決

Yes you can do that. The results will most likely be worse than for a supervised method. The general problem is often referred to as anomaly detection. The idea is to create a model of your data and for each new instance decide whether it comes from this model or not. There are many methods to do that and choosing the right one is difficult. You can start studying here.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top