質問

i want to get the index of a column with the highest value. However, I don't know how to handle missing values to make the correct calculation. NAs should be omitted (=ignored during summing up) and not converted to "0".

x=rep(NA,3); y=c(NA,0,-1); z=c(0, rep(NA,2))
data=cbind(x,y,z)

     x  y  z
[1,] NA NA  0
[2,] NA  0 NA
[3,] NA -1 NA

I want to get the index of a column with the highest value. In the example above it's [,3]. However the functions

   which.max(colSums(!is.na(data)))

or

apply(data,2,sum, na.rm=T)

don't generate the expected output.

Any help appreciated. Thx.

役に立ちましたか?

解決

You can determine the column index of the column whose sum is greatest among the columns with non missing values in this way:

dataAvailIdx <- which(apply(data,2,function(x) any(!is.na(x))))
dataAvailIdx[which.max(colSums(data[,dataAvailIdx],na.rm=TRUE))]
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top