i want to get the index of a column with the highest value. However, I don't know how to handle missing values to make the correct calculation. NAs should be omitted (=ignored during summing up) and not converted to "0".

x=rep(NA,3); y=c(NA,0,-1); z=c(0, rep(NA,2))
data=cbind(x,y,z)

     x  y  z
[1,] NA NA  0
[2,] NA  0 NA
[3,] NA -1 NA

I want to get the index of a column with the highest value. In the example above it's [,3]. However the functions

   which.max(colSums(!is.na(data)))

or

apply(data,2,sum, na.rm=T)

don't generate the expected output.

Any help appreciated. Thx.

有帮助吗?

解决方案

You can determine the column index of the column whose sum is greatest among the columns with non missing values in this way:

dataAvailIdx <- which(apply(data,2,function(x) any(!is.na(x))))
dataAvailIdx[which.max(colSums(data[,dataAvailIdx],na.rm=TRUE))]
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top