Question

Hi I have a conceptual question on a system I'm trying to develop that tries to classify emails. I have a large set (>100k) messages that are not spam and a large set of unclassified messages. Is it then possible to use a method (perhaps Bayesian) to detect spam without having a data set of spam? Do I absolutely need to classify spam?

Était-ce utile?

La solution

Yes you can do that. The results will most likely be worse than for a supervised method. The general problem is often referred to as anomaly detection. The idea is to create a model of your data and for each new instance decide whether it comes from this model or not. There are many methods to do that and choosing the right one is difficult. You can start studying here.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top