Question

i want to get the index of a column with the highest value. However, I don't know how to handle missing values to make the correct calculation. NAs should be omitted (=ignored during summing up) and not converted to "0".

x=rep(NA,3); y=c(NA,0,-1); z=c(0, rep(NA,2))
data=cbind(x,y,z)

     x  y  z
[1,] NA NA  0
[2,] NA  0 NA
[3,] NA -1 NA

I want to get the index of a column with the highest value. In the example above it's [,3]. However the functions

   which.max(colSums(!is.na(data)))

or

apply(data,2,sum, na.rm=T)

don't generate the expected output.

Any help appreciated. Thx.

Était-ce utile?

La solution

You can determine the column index of the column whose sum is greatest among the columns with non missing values in this way:

dataAvailIdx <- which(apply(data,2,function(x) any(!is.na(x))))
dataAvailIdx[which.max(colSums(data[,dataAvailIdx],na.rm=TRUE))]
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top