Creating averaged time-bins from an existing dataframe

https://stackoverflow.com/questions/22540830

18-06-2023
|

Question

I have the following dataframe called 'EasyScaled';

str(EasyScaled)
'data.frame':   675045 obs. of  3 variables:
$ Trial           : chr  "1_easy.wav" "1_easy.wav" "1_easy.wav" "1_easy.wav" ...
$ TrialTime       : num  3000 3001 3002 3003 3004 ...
$ PupilBaseCorrect: num  0.784 0.781 0.78 0.778 0.777 ...

The 'TrialTime' numeric variable denotes the time of each data point (3000 = 3000ms, 3001 = 3001 ms, etc.), 'PupilBaseCorrect' is my dependent variable, and the 'Trial' variable refers to the experimental trial.

I would like to create a new object which firstly divides my data into 3 time-bins (TimeBin1 = 3000-8000ms, TimeBin2 = 8001-13000ms, TimeBin3 = 13001 - 18000ms) and then calculate an average value for each timebin (for each trial) so that I would end up with something that looks like this (with the value given reflecting 'PupilBaseCorrect');

 Trial        TimeBin1     TimeBin2     TimeBin3
 1_easy       0.784        0.876        0.767 
 34_easy      0.781        0.872        0.765
 35_easy      0.78         0.871        0.762 
 ...etc       ...etc       ...etc       ....etc

I have tried using cut(), ddply() and some of the suggestions on this blog http://lamages.blogspot.co.uk/2012/01/say-it-in-r-with-by-apply-and-friends.html but haven't been able to find the correct code. I also tried this;

EasyTimeBin <- aggregate(PupilBaseCorrect ~ Trial + TrialTime[3000:8000, 8001:1300,1301:1800], data=EasyScaled, mean)

But got the following error;

Error in TrialTime[3000:8000, 8001:1300, 1301:1800] : 
incorrect number of dimensions

Any suggestions or advice would be much appreciated.

La solution

Good use of cut and ddply are correct, but here's some vanilla R chicken scratch that will do what you need.

# Generate example data
EasyScaled <- data.frame(
  Trial = paste0(c(sapply(1:3, function(x) rep(x, 9))), "_easy.wav"),
  TrialTime = c(sapply(seq_len(9)-1, function(x) (floor(x/3))*5000 + x%%3 + 3000)),
  PupilBaseCorrect = rnorm(27, 0.78, 0.1)
)

# group means of PupilBaseCorrect by Trial + filename
tmp <- tapply(EasyScaled$PupilBaseCorrect,
    paste0(EasyScaled$Trial, ',',
           as.integer((EasyScaled$TrialTime - 3000)/5000)+1), mean)

# melt & recast the array manually into a dataframe
EasyTimeBin <- do.call(data.frame,
   append(list(row.names = NULL,
               Trial = gsub('.wav,.*','',names(tmp)[3*seq_len(length(tmp)/3)])), 
     structure(lapply(seq_len(3),
         function(x) tmp[3*(seq_len(length(tmp)/3)-1) + x]
       ), .Names = paste0("TimeBin", seq_len(3))
     )
   )
)

print(EasyTimeBin)
#  Trial   TimeBin1  TimeBin2  TimeBin3
# 1 1_easy 0.7471973 0.7850524 0.8939581
# 2 2_easy 0.8096973 0.8390587 0.7757359
# 3 3_easy 0.8151430 0.7855042 0.8081268

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow