Domanda

I want to count the number of valid entries (numeric values) in an R data frame for each particular field grouped by another field, in this example 'Company'.

Company <- rep(c('A','B','C'),3)
Dates <- c(rep('20130131',3),rep('20130228',3),rep('20130331',3))
F1 <- c(1:3,NA,NA,6:9)
F2 <- c(21:27,NA,NA)
F3 <- c(31:39)
F4 <- c(NA,NA,NA,44:48,NA)
df <- data.frame(Company, Dates, F1, F2, F3, F4)

The resulting data.frame

> df

  Company    Dates F1 F2 F3 F4
1       A 20130131  1 21 31 NA
2       B 20130131  2 22 32 NA
3       C 20130131  3 23 33 NA
4       A 20130228 NA 24 34 44
5       B 20130228 NA 25 35 45
6       C 20130228  6 26 36 46
7       A 20130331  7 27 37 47
8       B 20130331  8 NA 38 48
9       C 20130331  9 NA 39 NA

I would then like to know for each 'Company' and each field how many dates have valid (numeric) values. The output would looks something like

Company F1 F2 F3 F4
      A  2  3  3  2
      B  2  2  3  2
      C  3  2  3  1

The data frames that I'm dealing with will have roughly 500 companies, 365 dates and 45 fields, so the doing things manually isn't really an option.

Any ideas would be greatly appreciated.

Thanks,

È stato utile?

Soluzione

A base solution and an alternative data.table solution:

aggregate(cbind(F1,F2,F3,F4) ~ Company, data=df, function(x) sum(!is.na(x)), na.action=na.pass)
  Company F1 F2 F3 F4
1       A  2  3  3  2
2       B  2  2  3  2
3       C  3  2  3  1

data.table:

dt <- as.data.table(df)
dt[,lapply(.SD,function(x) sum(!is.na(x))),by=Company,.SDcols=c("F1","F2","F3","F4")]

   Company F1 F2 F3 F4
1:       A  2  3  3  2
2:       B  2  2  3  2
3:       C  3  2  3  1

Altri suggerimenti

Since this is a question about tabulation, here's an option with table after stacking the "F" columns.

x <- cbind(Company = df$Company, 
           stack(df[setdiff(names(df), c("Company", "Dates"))]))
table(x[complete.cases(x), c(1, 3)])
#        ind
# Company F1 F2 F3 F4
#       A  2  3  3  2
#       B  2  2  3  2
#       C  3  2  3  1

## OR
## table(x[complete.cases(x), c("Company", "ind")])
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top