Question

I am trying to calculate the proportion of correct responses for each participant as a function of three factors (group, sound and language). My data frame looks like this:

participant group   sound   lang    resp 
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      0
advf03      adv     a       sp      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       in      0
advf03      adv     a       in      1
begf03      beg     a       in      1
begf03      beg     a       in      1
begf03      beg     a       sp      0

"Group" has 3 levels: adv, int, and beg. "Sound" has 3 levels: a, e, i. "Lang" has 2 levels: in, sp. A "1" implies a correct response and a "0" implies an incorrect response. I would like to have a proportion (i.e. percent correct) of the "1"'s for each participant as a new column in a new data frame. An example of the type of information I would like to have: Participant advf03 got 53% correct for "a" in "sp".

Here are 50 observations from my data:

structure(list(sound = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("a", 
"e", "i"), class = "factor"), resp = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), participant = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("2advf03", "2advf05", "2advm04", "2advm06", "2begf01", 
"2begf02", "2begf04", "2begf05", "2begm03", "2advf01", "2intf01", 
"2intf03", "2intf04", "2intf06", "2advm05"), class = "factor"), 
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("adv", 
"beg", "int"), class = "factor"), lang = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("in", "sp"), class = "factor")), .Names = c("sound", 
"resp", "participant", "group", "lang"), row.names = c(10L, 31L, 
36L, 43L, 47L, 49L, 52L, 59L, 61L, 65L, 66L, 68L, 71L, 79L, 97L, 
99L, 106L, 125L, 133L, 138L, 147L, 149L, 162L, 165L, 174L, 175L, 
33L, 37L, 112L, 136L, 154L, 186L, 11L, 50L, 89L, 92L, 104L, 105L, 
123L, 126L, 129L, 143L, 153L, 173L, 177L, 187L, 188L, 191L, 7L, 
12L), class = "data.frame")

This is what I have so far:

# get counts of subsets of factors
df <- as.data.frame(table(df))

# new column that gives the proportion of responses
df$prop <- df$Freq / 32

But this does not seems to give me the correct proportions. I know that I need to reduce the data so that I don't have so many observations (i.e. 1 value for each sound for each language for each participant, but I don't know the correct steps do that.

Was it helpful?

Solution

If I understand your question correctly, you would like to know the proportion of 1s by participant, sound, and language.

Because the proportion of 1s in a vector with only 0s and 1s is just the mean, this should work:

aggregate(data=df, resp ~ participant + group + lang, FUN="mean")

The output of that with your 50 observations is:

  participant group lang      resp
1     2advf03   adv   in 0.1875000
2     2advf03   adv   sp 0.1111111
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top