Frequency Count for All Possible Bins

https://stackoverflow.com/questions/21538207

r
frequency

06-10-2022
|

Domanda

I have a data frame. I would like a frequency table created that shows the bin frequency by "Group". If there is a bin with 0 entities, I want it to show that there are 0 entities in that bin.

If I use the table() function, I get a frequency count of all bins in my data frame, but not by "Group". It also does not tell me that, for example, I do not have any rows within Group 1 Bin 3. I also looked into tabulate() but that doesn't seem to be exactly what I need either. Somehow I need to tell it what the set of possible bins actually are.

Here is some example code.

    df = as.data.frame(rbind(c(1,1.2), c(1,1.4), c(1,2.1), c(1,2.5), c(1,2.7), c(1,4.1), c(2,1.6), c(2,4.5), c(2,4.3), c(2,4.8), c(2,4.9)))
    colnames(df) = c("Group", "Value")
    df.in = split(df, df$Group)

    FindBin = function(df){
      maxbin = max(ceiling(df$Value),na.rm=TRUE)+1 #what is the maximum bin value. 
       bin = seq(from=0, to=maxbin, by=1) #Specify your bins: 0 to the maximum value by increments of 1
       df$bin_index = findInterval(df$Value, bin, all.inside = TRUE) #Determine which bin the value is in 
      return(df)
    }

    df.out = lapply(names(df.in), function(x) FindBin(df.in[[x]]))
    df.out2 = do.call(rbind.data.frame, df.out) #Row bind the list of dataframes to one dataframe

The output of the df.out2 looks like this:

        Group Value bin_index
    1      1   1.2         2
    2      1   1.4         2
    3      1   2.1         3
    4      1   2.5         3
    5      1   2.7         3
    6      1   4.1         5
    7      2   1.6         2
    8      2   4.5         5
    9      2   4.3         5
    10     2   4.8         5
    11     2   4.9         5

In addition to the output above, I'd like a summary output of my results that looks something like this:

    Group     Bin     Freq
    1         1       0
    1         2       2
    1         3       3
    1         4       0
    1         5       1
    2         1       0
    2         2       1
    2         3       0
    2         4       0
    2         5       4

Any ideas?

Soluzione

Doesn't table do what you want for your first question:

df$bin_index <- factor(df$bin_index, levels=1:5)
table(df[, c("Group", "bin_index")])
#       bin_index
# Group 1 2 3 4 5
#     1 0 2 3 0 1
#     2 0 1 0 0 4

It shows the 0 entry for bin 3, group 2 (I presume that's what you meant, there are rows for bin 3 in group 1). Also, by setting the factor levels I was also able to get bin_index 1 to show up. For your second question, just use melt:

library(reshape2)
melt(table(df[, c("Group", "bin_index")]))
#    Group bin_index value
# 1      1         1     0
# 2      2         1     0
# 3      1         2     2
# 4      2         2     1
# 5      1         3     3
# 6      2         3     0
# 7      1         4     0
# 8      2         4     0
# 9      1         5     1
# 10     2         5     4

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow