summing across columns with missing values in a data.frame

https://stackoverflow.com/questions/21886859

13-10-2022
|

Pregunta

i want to get the index of a column with the highest value. However, I don't know how to handle missing values to make the correct calculation. NAs should be omitted (=ignored during summing up) and not converted to "0".

x=rep(NA,3); y=c(NA,0,-1); z=c(0, rep(NA,2))
data=cbind(x,y,z)

     x  y  z
[1,] NA NA  0
[2,] NA  0 NA
[3,] NA -1 NA

I want to get the index of a column with the highest value. In the example above it's [,3]. However the functions

   which.max(colSums(!is.na(data)))

apply(data,2,sum, na.rm=T)

don't generate the expected output.

Any help appreciated. Thx.

Solución

You can determine the column index of the column whose sum is greatest among the columns with non missing values in this way:

dataAvailIdx <- which(apply(data,2,function(x) any(!is.na(x))))
dataAvailIdx[which.max(colSums(data[,dataAvailIdx],na.rm=TRUE))]

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow