Pergunta

I'm pretty new to R so the answer might be obvious, but so far I have only found answers to similar problems that don't match, or which I can't translate to mine.

The Requirement: I have two vectors of the same length which contain numeric values as well as NA-values which might look like:

[1] 12  8 11  9 NA NA NA

[1] NA  7 NA 10 NA 11  9

What I need now is two vectors that only contain those values that are not NA in both original vectors, so in this case the result should look like this:

[1] 8 9

[1] 7 10

I was thinking about simply going through the vectors in a loop, but the dataset is quite large so I would appreciate a faster solution to that... I hope someone can help me on that...

Foi útil?

Solução

You are looking for complete.cases But you should put your vectors in a data.frame.

dat <- data.frame(x=c(12 ,8, 11, 9, NA, NA, NA),
                  y=c(NA ,7, NA, 10, NA, 11, 9))

dat[complete.cases(dat),]
  x  y
2 8  7
4 9 10

Outras dicas

Try this:

#dummy vector
a <- c(12,8,11,9,NA,NA,NA)
b <- c(NA,7,NA,10,NA,11,9)

#result
a[!is.na(a) & !is.na(b)]
b[!is.na(a) & !is.na(b)]

Something plus NA in R is generally NA. So, using that piece of information, you can simply do:

cbind(a, b)[!is.na(a + b), ]
#      a  b
# [1,] 8  7
# [2,] 9 10

More generally, you could write a function like the following to easily accept any number of vectors:

myFun <- function(...) {
  myList <- list(...)
  Names <- sapply(substitute(list(...)), deparse)[-1]
  out <- do.call(cbind, myList)[!is.na(Reduce("+", myList)), ]
  colnames(out) <- Names
  out
}

With that function, the usage would be:

myFun(a, b)
#      a  b
# [1,] 8  7
# [2,] 9 10

In my timings, this is by far the fastest option here, but that's only important if you are able to detect differences down to the microseconds or if your vector lengths are in the millions, so I won't bother posting benchmarks.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top