Pergunta

I am working with vast amount of data which consists of outliers.The code works well with most of the dataset but does not work with few.

This sample data:

set.seed(100)
m=rnorm(200)
m[1]=100   #inserting outlier
m[2]=50

My code is :

library(outliers)
lg=outlier(m, logical=TRUE)
for(i in 1:length(lg)){
if(lg[i]==c("TRUE")){ 
 m[i]=NA }}

This replaces outliers with NAs. Now in this case 100 is removed but 50 is not removed. Same thing is happening with my dataset. I am not able to figure out why. I wish to receive help on this.

Thank you for reading.

Foi útil?

Solução

Here I expand my comment above into an answer.

In your example m[40]=m[90]=m[67]=150 are ties. If you try using m[40] = 150; m[90] = 200; m[67] = 250; I think you will find that only m[67] is identified as an outlier. Maybe ask on the sister statistics site, Cross Validated, for the best definition of an outlier with your data set. Then maybe somebody here can help you program the R code for that definition.

Below is R code for a simple definition of an outlier: an outlier is any observation with a value > 50. I do not recommend that you use that definition. In fact, please do not. I use it here only for illustration. The code below replaces all outliers with NA.

set.seed(100)
m=rnorm(200)
m[10]=100
m[40]=150
m[90]=200
m[67]=250
m

outlier <- rep(0,length(m))
outlier[m>50]=NA
outlier

m[is.na(outlier)]=NA
m

Outras dicas

It depends on your definition of outlier. There are plenty.

The outlier method defines outlier as **the object(s) with the largest difference from the mean. This is a rather weak definition, as it enforces the number of outliers to be 1 (unless tied).

Try this data set:

0 .1 .1 .1 -.1 -.1 -.1

On this data set, it should remove all but the 0!

Now change this to:

0 .1 .1 .1 -.1 -.1 -.100000001

Now only one element will be removed, although the differece is at the margin of precision and barely significant.

Maybe try a more clever outlier detection method.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top