Using R, how to aggregate by day of week?
Question
I have a df/zoo/xts/whatever that is split by day of week. I'd like to further split this by week for each entry.
An example would be for Friday, there is a list of id's with an associated time for each id. These times could be for any Friday in a year's span. I'd like to make a new df that has each id along with the counts for each week (in sequential order) on that Friday.
It would look something like the following where each w column is a different Friday's count:
id w1 w2 w3 w4
1 id_1 1 2 2 8
2 id_2 3 1 5 2
3 id_3 7 4 10 7
dput:
structure(list(id = c("id_1", "id_2", "id_3"), w1 = c(1, 3, 7
), w2 = c(2, 1, 4), w3 = c(2L, 5L, 10L), w4 = c(8L, 2L, 7L)), .Names = c("id",
"w1", "w2", "w3", "w4"), row.names = c(NA, 3L), class = "data.frame")
This seems like it's ripe for aggregate, but I cant quite get the syntax right. Other things I've tried are below:
# Applies sum to everything, which doesnt make sense in this context
apply.weekly(friday, sum)
# I considered doing something like getting the unique weeks with:
as.numeric(unique(format(friday[,2], "%U")))
# and then generating each week, getting the counts for each user, and then making a new df from this process. But this seems very inefficient.
Edit: output from str(data[1:20,]):
'data.frame': 20 obs. of 2 variables:
$ id : num 1 2 3 4 5 1 2 3 3 2 ...
$ time: POSIXct, format: "2011-04-25 14:00:00" "2011-04-28 20:00:00" "2011-05-03 06:00:00" "2011-05-06 14:00:00" ...
output from dput(data[1:20,]):
structure(list(id = c(1, 2, 3, 4, 5, 1, 2, 3, 3, 2, 1, 4, 3,
2, 1, 4, 3, 2, 1, 7), time = structure(c(1303754400, 1304035200,
1304416800, 1304704800, 1304920800, 1305252000, 1305428400, 1305522000,
1305774000, 1306404000, 1306422000, 1308261600, 1308290400, 1308340800,
1308542400, 1308715200, 1308722400, 1308844800, 1309575600, 1309730400
), class = c("POSIXct", "POSIXt"))), .Names = c("id", "time"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
Solution
If I am understanding what you want, you need to make additional columns for the day of the week (so you can identify that) and for the week of the year (so that you can end up with separate columns for each). Using the data
that you gave a dput()
for:
data$day.of.week <- format(data$time, "%A")
data$week.of.year <- format(data$time, "%U")
Effectively now you want to reshape the data, so using the reshape2
package (not the only way, but the one I am most familiar with)
library("reshape2")
dcast(data[data$day.of.week=="Friday",], id~week.of.year,
value_var="time", fun.aggregate=length)
In that example, I subsetted the data to just get the Fridays. If you wanted to do all the days, but each day separately, the plyr
package can help with that iterating.
library("plyr")
dlply(data, .(day.of.week), dcast, id~week.of.year,
value_var="time", fun.aggregate=length)
The results of these two are:
> dcast(data[data$day.of.week=="Friday",], id~week.of.year, value_var="time", fun.aggregate=length)
id 18 24 26
1 1 0 0 1
2 2 0 1 0
3 4 1 0 0
> dlply(data, .(day.of.week), dcast, id~week.of.year, value_var="time", fun.aggregate=length)
$Friday
id 18 24 26
1 1 0 0 1
2 2 0 1 0
3 4 1 0 0
$Monday
id 17
1 1 1
$Saturday
id 19
1 2 1
$Sunday
id 19 20 25 27
1 1 0 0 1 0
2 3 0 1 0 0
3 5 1 0 0 0
4 7 0 0 0 1
$Thursday
id 17 19 21 24 25
1 1 0 1 1 0 0
2 2 1 0 1 0 1
3 3 0 0 0 1 0
4 4 0 0 0 1 0
$Tuesday
id 18 25
1 3 1 1
2 4 0 1
$Wednesday
id 20
1 3 1
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
day.of.week
1 Friday
2 Monday
3 Saturday
4 Sunday
5 Thursday
6 Tuesday
7 Wednesday