Why is Local Outlier Factor classified as Unsupervised if it requires training data with no outliers?

datascience.stackexchange https://datascience.stackexchange.com/questions/45054

  •  01-11-2019
  •  | 
  •  

Question

In Scikit-Learn, the Local Outlier Factor (LOF) algorithm is defined as an unsupervised anomaly detection method.

So then I don't understand why this algorithm requires pre-filtered training data. Perhaps "training data" here simply means "data to start with?" But the example code provided by SciKit-Learn clearly shows training data which explicitly contains NO anomalies. Does that mean that this model would NOT work if the training data contains anomalies? And more importantly, how do I find anomalies in the training data using this algorithm?

Here is the website. https://scikit-learn.org/stable/auto_examples/neighbors/plot_lof_novelty_detection.html#sphx-glr-auto-examples-neighbors-plot-lof-novelty-detection-py

I changed the sample code to include anomaly data in the training data set and the model still found a decision boundary that looked correct. So am I just getting confused because of the way the documentation and sample code is written? Or is does this model really need a clean training data set?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top