Question

I would like to calculate the standard deviation of every 4 values down a column from the first to the last observation. I have found lots of answers for moving SD functions, but I simply need a line of code that will calculate the sd() for every 4 data values and write the answers into a new column in the data frame as below:

Example data:

Obs Count
1   56
2   29
3   66
4   62
5   49
6   12
7   65
8   81
9   73
10  66
11  71
12  59

Desired output:

Obs Count SD
1   56    16.68
2   29    16.68
3   66    16.68
4   62    16.68
5   49    29.55
6   12    29.55
7   65    29.55
8   81    29.55
9   73    6.24
10  66    6.24
11  71    6.24
12  59    6.24

I tried the below code, but this is obviously incorrect:

a <- for(i in 1: length(df)) sd(df$Count[i:(i+3)])

This should be a very easy task, but I have not been able to find an answer. I am still learning and any help would be appreciated.

Was it helpful?

Solution

In base R, you can use the following to create an index of "every 4 rows":

(seq_len(nrow(mydf))-1) %/% 4
# [1] 0 0 0 0 1 1 1 1 2 2 2 2

Using that, you can use ave to get the desired result:

mydf$SD <- ave(mydf$Count, (seq_len(nrow(mydf))-1) %/% 4, FUN = sd)
mydf
#    Obs Count        SD
# 1    1    56 16.680827
# 2    2    29 16.680827
# 3    3    66 16.680827
# 4    4    62 16.680827
# 5    5    49 29.545163
# 6    6    12 29.545163
# 7    7    65 29.545163
# 8    8    81 29.545163
# 9    9    73  6.238322
# 10  10    66  6.238322
# 11  11    71  6.238322
# 12  12    59  6.238322

OTHER TIPS

An anternative is using rollapply from zoo package in combination with rep.

> library(zoo)
> N <- 4 # every four values
> SDs <- rollapply(df[,2], width=N, by=N, sd)
> df$SD <- rep(SDs, each=N)
> df
   Obs Count        SD
1    1    56 16.680827
2    2    29 16.680827
3    3    66 16.680827
4    4    62 16.680827
5    5    49 29.545163
6    6    12 29.545163
7    7    65 29.545163
8    8    81 29.545163
9    9    73  6.238322
10  10    66  6.238322
11  11    71  6.238322
12  12    59  6.238322

You might want to get it all in a once:

df$SD <- rep( rollapply(df[,2], width=N, by=N, sd), each=N)

This looks faster (i didn't test tough):

# mydf = your data
idxs = rep(1:nrow(mydf), each = 4, length = nrow(mydf))

mydf = within(mydf, {
  Sd = rep(tapply(Count, idxs, sd), each = 4)
})
print(mydf)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top