Pergunta

i want to get the index of a column with the highest value. However, I don't know how to handle missing values to make the correct calculation. NAs should be omitted (=ignored during summing up) and not converted to "0".

x=rep(NA,3); y=c(NA,0,-1); z=c(0, rep(NA,2))
data=cbind(x,y,z)

     x  y  z
[1,] NA NA  0
[2,] NA  0 NA
[3,] NA -1 NA

I want to get the index of a column with the highest value. In the example above it's [,3]. However the functions

   which.max(colSums(!is.na(data)))

or

apply(data,2,sum, na.rm=T)

don't generate the expected output.

Any help appreciated. Thx.

Foi útil?

Solução

You can determine the column index of the column whose sum is greatest among the columns with non missing values in this way:

dataAvailIdx <- which(apply(data,2,function(x) any(!is.na(x))))
dataAvailIdx[which.max(colSums(data[,dataAvailIdx],na.rm=TRUE))]
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top