How would you do this tricky subsetting in R?

https://stackoverflow.com/questions/22462011

16-06-2023
|

Question

This question might have a simple answer, but I can't figure out how to do it. I'd like to subset a data frame in a manner that is not straightforward.

I have a data frame with 4 columns that express the results of an experiment. The first column has the subject number, the second the item number, the third the type of measure that was taken and the fourth the recorded reading time. I would like to replace the 0 values in the Value column but only in specific conditions. To make this concrete, here is the data frame:

Subject = c(rep("S1",6), rep("S2",6))       #two subjects
Item    = rep(c(rep("I1",3),rep("I2",3)),2) #two items for each subject
Measure = rep(c("ff","fp","tt"),4)          #three different measures for each item
Value   = c(0,33,21,2,45,66,78,4,3,0,25,67) #reading times
df      = data.frame(Subject,Item,Measure,Value)
df

   Subject Item Measure Value
1       S1   I1      ff     0
2       S1   I1      fp    33
3       S1   I1      tt    21
4       S1   I2      ff     2
5       S1   I2      fp    45
6       S1   I2      tt    66
7       S2   I1      ff    78
8       S2   I1      fp     4
9       S2   I1      tt     3
10      S2   I2      ff     0
11      S2   I2      fp    25
12      S2   I2      tt    67

This is the tricky part! I want to get all cases where ff was 0, and to modify my data frame in the following manner: for only the first fixation and first-pass measures (ff and fp), I'd like to overwrite the existing values in Value with NAs, creating a data frame that looks like the one below. The rest of the data frame should remain unchanged. How would you achieve this in a simple manner? Any suggestions will be much appreciated!

Subject = c(rep("S1",6), rep("S2",6))
Item    = rep(c(rep("I1",3),rep("I2",3)),2)
Measure = rep(c("ff","fp","tt"),4)
Value   = c("NA","NA",21,2,45,66,78,4,3,"NA","NA",67)
dfnew   = data.frame(Subject,Item,Measure,Value)  
dfnew
    Subject Item Measure Value
1       S1   I1      ff    NA
2       S1   I1      fp    NA
3       S1   I1      tt    21
4       S1   I2      ff     2
5       S1   I2      fp    45
6       S1   I2      tt    66
7       S2   I1      ff    78
8       S2   I1      fp     4
9       S2   I1      tt     3
10      S2   I2      ff    NA
11      S2   I2      fp    NA
12      S2   I2      tt    67

Solution

This feels like something that would be much easier in wide format than long format. How about something like this? You could resort the final data frame if order was important.

library(reshape2)
d2 <- dcast(Subject + Item ~ Measure, data=df, value.var="Value")
d2
##   Subject Item ff fp tt
## 1      S1   I1  0 33 21
## 2      S1   I2  2 45 66
## 3      S2   I1 78  4  3
## 4      S2   I2  0 25 67
k <- d2$ff==0
d2$ff[k] <- d2$fp[k] <- NA
melt(d2)
## 1       S1   I1       ff    NA
## 2       S1   I2       ff     2
## 3       S2   I1       ff    78
## 4       S2   I2       ff    NA
## 5       S1   I1       fp    NA
## 6       S1   I2       fp    45
## 7       S2   I1       fp     4
## 8       S2   I2       fp    NA
## 9       S1   I1       tt    21
## 10      S1   I2       tt    66
## 11      S2   I1       tt     3
## 12      S2   I2       tt    67

OTHER TIPS

Maybe you can use which:

idx <- with(df, which(Measure=="ff" & Value==0))
df[idx, "Value"] <- NA
idx <- Filter(function(i) df[i, "Measure"]=="fp", idx+1)
df[idx, "Value"] <- NA

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow