Question

I am new to r (with experience in MatLab). I'm still exploring with the syntax and learning to think in a R way.

I have some data (3000 x 3000) in data.frame class and the following code seems to perform very slow.

data[is.na(data)] = 0 

I would like to get some comments on this from some experienced users. Thanks.

Was it helpful?

Solution

Assuming you are just trying to replace NA with 0 and that you have all numbers, use a matrix.

x <- matrix(runif(9e+06,0,100),ncol=3000)
x[x <= 55 & x >= 54] <- NA
table(is.na(x))
  FALSE    TRUE 
8910086   89914
x[is.na(x)] <- 0
table(is.na(x))
  FALSE 
9000000

edit: the above conversion takes less than 1 second as a matrix, but still only 3 seconds as a data.frame (not including the table commands, which are the slowest parts there). What sort of time do you mean when you say very slow?

edit: response to Simon

system.time(is.na(x) <- 0)
user  system elapsed 
0.024   0.044   0.068 # yes faster
table(is.na(x))
FALSE    TRUE 
8909945   90055 # doesn't change x
system.time(x[is.na(x)] <- 0)
user  system elapsed 
0.252   0.032   0.287 # slower
table(is.na(x))
FALSE 
9000000 # changes x
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top