How to create contigency tables (crosstabs) in R for a subset of columns with categorical data?

StackOverflow https://stackoverflow.com/questions/3518010

  •  29-09-2019
  •  | 
  •  

Question

I have a table whose header looks like this (I've simplified it):

id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10

where each row, except for id, is a categorical variable. Let's name the categories A, B, C, D, E.

I would like to create a contingency table for some of the columns, such as below (for brevity, I have not put sample numbers in the cells). Getting the total column/row would be great, but not mandatory, I can calculate it myself later.

      a1  a2  a3  a4 Total
    ----------------------
    A|
    B|
    C|
    D|
    E|
Total|

Thus, the question is how to create a crosstab based on multiple columns in R? The examples I've seen with table() and xtabs() use a column only. In my case, the columns are adjacent, so one crosstab would summarize columns a1..a4, another a5..a7 and so on. I hope there is an elegant way to do this.

I'm a programmer, but a newbie in R.

Thank you in advance.

Was it helpful?

Solution

Your data is poorly formatted for this purpose. Here's one approach to appropriately reshaping the data with the reshape package.

library(reshape)
data.m <- melt(data, id = "id")

To compute a table for all levels, with margins, you could use

cast(data.m, value ~ variable, margins = T)

For a subset, take the relevant subset of data.m.

OTHER TIPS

Here's how to do it using base R commands. You don't need the for loop if every column has the same factor levels, but the loop would be a good fail-safe.

> set.seed(21)
> df <- data.frame(
+   id=1:20,
+   a1=sample(letters[1:4],20,TRUE),
+   a2=sample(letters[1:5],20,TRUE),
+   a3=sample(letters[2:5],20,TRUE),
+   a4=sample(letters[1:5],20,TRUE),
+   a5=sample(letters[1:5],20,TRUE),
+   a6=sample(letters[1:5],20,TRUE) )
> 
> for(i in 2:NCOL(df)) {
+   levels(df[,i]) <- list(a="a",b="b",c="c",d="d",e="e")
+ }
> 
> addmargins(mapply(table,df[,-1]))
    a1 a2 a3 a4 a5 a6 Sum
a    6  2  0  2  5  3  18
b    3  3  7  2  1  3  19
c    5  3  1  6  5  3  23
d    6  8  6  1  5  3  29
e    0  4  6  9  4  8  31
Sum 20 20 20 20 20 20 120
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top