x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
For the whole dataframe:
sum(is.na(x))/prod(dim(x))
Or
mean(is.na(x))
For columns:
apply(x, 2, function(col)sum(is.na(col))/length(col))
Or
colMeans(is.na(x))
Question
I am trying to find the percentage of NAs in columns as well as inside the whole dataframe:
The first method which I have commented gives me zero and the second method which is not commented gives me a matrix. Not sure what I am missing. Any hint is truly appreciated!
cp.2006<-read.csv(file="cp2006.csv",head=TRUE)
#countNAs <- function(x) {
# sum(is.na(x))
#}
#total=0
#for (i in col(cp.2006)) {
# total=countNAs(i)+total
#}
#print(total)
count<-apply(cp.2006, 1, function(x) sum(is.na(x)))
dims<-dim(cp.2006)
num<-dims[1]*dims[2]
NApercentage<-(count/num) * 100
print(NApercentage)
Solution
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
For the whole dataframe:
sum(is.na(x))/prod(dim(x))
Or
mean(is.na(x))
For columns:
apply(x, 2, function(col)sum(is.na(col))/length(col))
Or
colMeans(is.na(x))
OTHER TIPS
Updated version of dplyr which doesnt support funs anymore:
x%>% summarise_all(list(name = ~sum(is.na(.))/length(.)))
You could also use dplyr::summarize_all
for the column-wise proportions.
x %>% summarize_all(funs(sum(is.na(.)) / length(.)))
Which will give
x y
1 0.25 0.5
If you are interested to find percentage of complete cases.
Using Same Example mentioned here.
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
Output :
x y
1 1 NA
2 2 NA
3 NA 4
4 3 5
Finding Complete cases:
complete.cases(x)
Output :
[1] FALSE FALSE FALSE TRUE
Percentage of complete cases:
mean(complete.cases(x))
Output:
[1] 0.25
That means 25% of complete rows are available in data provided. i.e Only fourth row is complete rest all contains NA values.
Cheers!
You can Try This
colMeans(is.na.data.frame(dataframe_name))
Try this :
sapply(data, function(y) round((sum(length(which(is.na(y))))/nrow(data))*100.00,2))