Create new column in dataframe that begins adding values from a column starting from a row specified by a third column

https://stackoverflow.com/questions/22488407

16-06-2023
|

문제

I am trying to determine the latency of a variable in a data.frame that contains several positions where relevant data is listed by between a series of 'start' and 'stop' markers.

To accomplish this, I need to create a new column which begins counting at 0 at the start of every trial and tallies up in milliseconds the until either the stop of the trial or the start of the next trial (whichever is easier, I assume the latter.)

I have this:

df <- data.frame(c(0, 32, 32, 32, 32, 31, 31, 29),  c(0, 32, 64, 96, 128, 159, 190, 219), c("Start", NA_character_, NA_character_, "Stop", NA_character_, NA_character_, "Start", NA_character_))
colnames(df) <- c('Delta', 'TimeMs', 'Marker')

And I want to make this:

df <- data.frame(c(0, 32, 32, 32, 32, 31, 31, 29),  c(0, 32, 64, 96, 128, 159, 190, 119), c("Start", NA_character_, NA_character_, "Stop", NA_character_, NA_character_, "Start", NA_character_), c(0, 32, 64, 96, 128, 159, 190, 0))
colnames(df) <- c('Delta', 'TimeMs', 'Marker', 'Latency')

Obviously, I would make a new column filled with automatically generated NAs:

df$Latency <- NA

Then I thought I would label in the new column 0 where the Stat position is located:

df$Latency [which(df$Marker == 'Start')] <- 0

From there I am stuck. I thought I could use the which command somehow, but my rudimentary R skills have led me to believe this method is over simplified and thus incorrect.

Thanks for the help in advance, and please ask if you need clarification!

edit: fixed example, title

edit2: fixed example

edit3: used real NA_character_

해결책

This seems to work

df <- data.frame(Delta=c(0, 32, 32, 32, 32, 31, 31, 29),  
                 TimeMS=c(0, 32, 64, 96, 128, 159, 190, 219), 
                 Marker=c("Start", "NA", "NA", "Stop", "NA", "NA", "Start", "NA"))

df$group   <- cumsum(df$Marker=="Start" & !is.na(df$Marker))
df$Latency <- unlist(aggregate(TimeMS~group,df,function(x)cumsum(c(0,diff(x))))$TimeMS)
df[,"group"] <- NULL
df
#   Delta TimeMS Marker Latency
# 1     0      0  Start       0
# 2    32     32     NA      32
# 3    32     64     NA      64
# 4    32     96   Stop      96
# 5    32    128     NA     128
# 6    31    159     NA     159
# 7    31    190  Start       0
# 8    29    219     NA      29

First we add a column, df$group which increments by 1 every time df$Marker=="Start" (so, df$group=1 for rows 1:6 and =2 for rows 7:8). Then we aggregate TimeMS by group using the diff(...) function. Applied to a vector of length n, diff(...) returns a vector of length n-1 containing the difference between a given row and the previous row. So we need to insert a 0 a the beginning of this vector. aggregate(...) returns two sets of vectors (one for group==1, and one for group==2), so we need to unlist(...) into a single vector before binding to df. The last line just removes df$group

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow