Pergunta

I have a data frame 1488 obs. and 400 var. I am trying to log all the values in the table and then using the package outliers with the command rm.outlier, I am tyring to remove the outliers. The only problem is that I get this error:

Error in data.frame(V1 = c(-0.886056647693163, -0.677780705266081, -1.15490195998574,  : arguments imply differing number of rows: 1487, 1480, 1481, 1475, 1479, 1478, 1483, 1485, 1484, 1477, 1482, 1469

This is my code:

datalog <- matrix(0,nrow(data),ncol(data))
datalog[,] <- apply(data,2,log10)
datalog[datalog==-Inf] <- 0
datalog <- as.data.frame(datalog, stringsAsFactors=F)

testNoOutliers <- rm.outlier(datalog, fill = FALSE, 
                         median = FALSE, opposite = FALSE)

My data: https://skydrive.live.com/redir?resid=CEC7696F3B5BFBC6!341&authkey=!APiwy6qasD3-yGo

Thanks for any help

Foi útil?

Solução 2

You got this error because different number of outliers are removed from each column and so columns can not be put together in one data frame.

If you want to replace outliers with NA, one solution would be

out.rem<-function(x) {
  x[which(x==outlier(x))]=NA
  x
}

apply(datalog,2,out.rem)

To remove entire rows containing outlier values, you could add additional line to @agstudy solution

ll <- apply(datalog,2,function(x) which(x == outlier(x)))
new.datalog <- datalog[-unique(unlist(ll)),]

Outras dicas

You got the error because you don't have the same number of outlier bar variable.

To correct it you have 2 options :

  1. put the option fill = TRUE :the mean is placed instead of outlier and not removed

  2. Remove the oulier by yourself:

      # get a list of outlier index for each variable
      ll <- apply(datalog,2,function(x) which(x == outlier(x)))
    
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top