If I understand you correctly you can do something like what I have written below. Use dcast
to get the frequencies of each var
across each factor
, then use rowSums()
to add them up to get absolute frequencies for each var across all factors. You can use prop.table
to work out the relative frequency of each var
across each factor
. Note I made a slight change to your example data so you can follow what is happening at each stage (I added a 'bbb'
value for factor
b
when log == TRUE
). Try this:
#Data frame (note 2 values for 'bbb' for factor 'b' when log == TRUE)
dtf<-data.frame(c("a","a","b","c","b","b"),c("aaa","bbb","aaa","aaa","bbb","bbb"),c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE))
colnames(dtf)<-c("factor","var","log")
dtf
# factor var log
#1 a aaa TRUE
#2 a bbb FALSE
#3 b aaa TRUE
#4 c aaa TRUE
#5 b bbb TRUE
#6 b bbb TRUE
library(reshape2)
# Find frequency of each var across each factor using dcast
mydat <- dcast( dtf[dtf$log==TRUE , ] , var ~ factor , sum )
# var a b c
#1 aaa 1 1 1
#2 bbb 0 2 0
# Use rowSums to find absolute frequency of each var across all groups
mydat$counts <- rowSums( mydat[,-1] )
# Order by decreasing frequency and just use first 10 rows
mydat[ order( mydat$counts , decreasing = TRUE ) , ]
# var a b c counts
#1 aaa 1 1 1 3
#2 bbb 0 2 0 2
# Relative proportions for each var across the factors
data.frame( var = mydat$var , round( prop.table( as.matrix( mydat[,-c(1,ncol(mydat))]) , 1 ) , 2 ) )
# var a b c
#1 aaa 0.33 0.33 0.33
#2 bbb 0.00 1.00 0.00