Question

Here is an example of my data:

           id   score
1          82   0.50000
2          82   0.39286
3          82   0.56250
4         328   0.50000
5         328   0.67647
6         328   0.93750
7         328   0.91667

I want to make a column of moving average's of scores for each id.

So I need to somehow group the data by id then apply a MA function to that grouped data and then have the output as another column "MA_score"

I would like my output to look like this:

           id   score    MA_score
1          82   0.50000   NULL
2          82   0.39286   0.xxxx
3          82   0.56250   NULL
4         328   0.50000   NULL
5         328   0.67647   0.yyyy
6         328   0.93750   0.qqqq
7         328   0.91667   NULL
Was it helpful?

Solution

You could use split and rollapply from the zoo package as one of many ways to approach this. Note that in the example below I set the width of the rollapply function to 1 so it just returns each value. For widths greater than one it will take the mean of that number of values.

require(zoo)
sapply( split( df , df$id) , function(x) rollapply( x , width = 1 , align = 'left' , mean) )
#Note that by setting width = 1 we just return the value
$`82`
     id   score
[1,] 82 0.50000
[2,] 82 0.39286
[3,] 82 0.56250

$`328`
      id   score
[1,] 328 0.50000
[2,] 328 0.67647
[3,] 328 0.93750
[4,] 328 0.91667

If we were to set width = 3 you would get:

$`82`
     id   score
[1,] 82 0.48512

$`328`
      id     score
[1,] 328 0.7046567
[2,] 328 0.8435467

Or you could use aggregate in base R:

aggregate(  score ~ id , data = df , function(x) rollapply( x , width = 1 , align = 'left' , mean)  )
   id                              score
1  82          0.50000, 0.39286, 0.56250
2 328 0.50000, 0.67647, 0.93750, 0.91667

There are quite a few ways to do this. I would precisely define your moving average function though, because there are many ways to calculate it (check out for example TTR:::SMA)


Or more straightforward using ave:

within(df, { MA_score <- ave(score, id, FUN=function(x) 
                rollmean(x, k=3, na.pad = TRUE))})

OTHER TIPS

You could split your data by unique ID values, calculate the rolling mean (from 'zoo' package) for each of these unique IDs and append the results to your initial dataframe:

# Required packages
library(zoo)

# Data setup
df <- data.frame(id = c(82, 82, 82, 328, 328, 328, 328), 
                 score = c(0.5, 0.39286, 0.5625, 0.5, 0.67647, 0.9375, 0.91667))

# Split data by unique IDs
df.sp <- split(df, df$id)

# Calculate rolling mean for each unique ID
df.ma <- lapply(seq(df.sp), function(i) {
  rollmean(df.sp[[i]]$score, k = 3, na.pad = TRUE)
})

# Append column 'MA_score' to dataframe
for (i in seq(names(df.sp))) {
  df[which(df$id == names(df.sp)[i]), "MA_score"] <- df.ma[[i]]
}

df
   id   score  MA_score
1  82 0.50000        NA
2  82 0.39286 0.4851200
3  82 0.56250        NA
4 328 0.50000        NA
5 328 0.67647 0.7046567
6 328 0.93750 0.8435467
7 328 0.91667        NA
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top