سؤال

I want to count the number of valid entries (numeric values) in an R data frame for each particular field grouped by another field, in this example 'Company'.

Company <- rep(c('A','B','C'),3)
Dates <- c(rep('20130131',3),rep('20130228',3),rep('20130331',3))
F1 <- c(1:3,NA,NA,6:9)
F2 <- c(21:27,NA,NA)
F3 <- c(31:39)
F4 <- c(NA,NA,NA,44:48,NA)
df <- data.frame(Company, Dates, F1, F2, F3, F4)

The resulting data.frame

> df

  Company    Dates F1 F2 F3 F4
1       A 20130131  1 21 31 NA
2       B 20130131  2 22 32 NA
3       C 20130131  3 23 33 NA
4       A 20130228 NA 24 34 44
5       B 20130228 NA 25 35 45
6       C 20130228  6 26 36 46
7       A 20130331  7 27 37 47
8       B 20130331  8 NA 38 48
9       C 20130331  9 NA 39 NA

I would then like to know for each 'Company' and each field how many dates have valid (numeric) values. The output would looks something like

Company F1 F2 F3 F4
      A  2  3  3  2
      B  2  2  3  2
      C  3  2  3  1

The data frames that I'm dealing with will have roughly 500 companies, 365 dates and 45 fields, so the doing things manually isn't really an option.

Any ideas would be greatly appreciated.

Thanks,

هل كانت مفيدة؟

المحلول

A base solution and an alternative data.table solution:

aggregate(cbind(F1,F2,F3,F4) ~ Company, data=df, function(x) sum(!is.na(x)), na.action=na.pass)
  Company F1 F2 F3 F4
1       A  2  3  3  2
2       B  2  2  3  2
3       C  3  2  3  1

data.table:

dt <- as.data.table(df)
dt[,lapply(.SD,function(x) sum(!is.na(x))),by=Company,.SDcols=c("F1","F2","F3","F4")]

   Company F1 F2 F3 F4
1:       A  2  3  3  2
2:       B  2  2  3  2
3:       C  3  2  3  1

نصائح أخرى

Since this is a question about tabulation, here's an option with table after stacking the "F" columns.

x <- cbind(Company = df$Company, 
           stack(df[setdiff(names(df), c("Company", "Dates"))]))
table(x[complete.cases(x), c(1, 3)])
#        ind
# Company F1 F2 F3 F4
#       A  2  3  3  2
#       B  2  2  3  2
#       C  3  2  3  1

## OR
## table(x[complete.cases(x), c("Company", "ind")])
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top