Pergunta

In the dataset I am working on, there are 24 variables and all of them have the outliers of 99. So I need to remove 99 from all these variables. Is there a quick way I can do this? I can do this one by one by using:

education <- subset(ex1, ex1$education<99)

ex1 is my dataset, can I think I need to use data.frame to do this?

Foi útil?

Solução 3

Are you really talking about outlier's or about a flag value of 99 that you want to remove? The latter would simply be:

ex1[ex1 == 99] <- NA

Outras dicas

Definitely suggest using a data.frame and if you want to remove all rows with 99 then you can do:

ex1 <- data.frame(
  a = sample(90:99,100, replace=TRUE),
  b = sample(90:99,100, replace=TRUE),
  c = sample(90:99,100, replace=TRUE),
  d = sample(90:99,100, replace=TRUE),
  e = sample(90:99,100, replace=TRUE),
  f = sample(90:99,100, replace=TRUE)
)

print(nrow(ex1))

ex1 <- ex1[complete.cases(sapply(ex1, function(val) ifelse(val == 99, NA, val))),]

print(nrow(ex1))

(The print()'s are just to show that there are a different # of rows)

otherwise, you should use @infominer's suggestion (which was literally just edited to do a simpler/alternate version of the remove).

Try this

#assuming ex1 is a data.frame
#if you want to remove the 99s completely
ex.wo.outliers <-sapply(ex1, function(x) subset(x, x!=99))
#if you want to keep the 99s as NAs
ex.withsub <-sapply(ex1, function(x) ifelse(x == 99,NA,x)

the first will remove all rows with 99s in any of your variables the second will take care of all your variables and make them NA

I recommend the second, as this will preserve the dimensions of your data.frame. The second will result in different lengths for each variable, in case you have a row with some 99s and some valid values.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top