Using R, how to aggregate by day of week?

https://stackoverflow.com/questions/8108236

27-02-2021
|

Question

I have a df/zoo/xts/whatever that is split by day of week. I'd like to further split this by week for each entry.

An example would be for Friday, there is a list of id's with an associated time for each id. These times could be for any Friday in a year's span. I'd like to make a new df that has each id along with the counts for each week (in sequential order) on that Friday.

It would look something like the following where each w column is a different Friday's count:

    id w1 w2 w3 w4
1 id_1  1  2  2  8
2 id_2  3  1  5  2
3 id_3  7  4 10  7

dput:

structure(list(id = c("id_1", "id_2", "id_3"), w1 = c(1, 3, 7
), w2 = c(2, 1, 4), w3 = c(2L, 5L, 10L), w4 = c(8L, 2L, 7L)), .Names = c("id", 
"w1", "w2", "w3", "w4"), row.names = c(NA, 3L), class = "data.frame")

This seems like it's ripe for aggregate, but I cant quite get the syntax right. Other things I've tried are below:

# Applies sum to everything, which doesnt make sense in this context
apply.weekly(friday, sum)

# I considered doing something like getting the unique weeks with:
as.numeric(unique(format(friday[,2], "%U")))
# and then generating each week, getting the counts for each user, and then making a new df from this process. But this seems very inefficient.

Edit: output from str(data[1:20,]):

'data.frame':   20 obs. of  2 variables:
 $ id  : num  1 2 3 4 5 1 2 3 3 2 ...
 $ time: POSIXct, format: "2011-04-25 14:00:00" "2011-04-28 20:00:00" "2011-05-03 06:00:00" "2011-05-06 14:00:00" ...

output from dput(data[1:20,]):

structure(list(id = c(1, 2, 3, 4, 5, 1, 2, 3, 3, 2, 1, 4, 3, 
2, 1, 4, 3, 2, 1, 7), time = structure(c(1303754400, 1304035200, 
1304416800, 1304704800, 1304920800, 1305252000, 1305428400, 1305522000, 
1305774000, 1306404000, 1306422000, 1308261600, 1308290400, 1308340800, 
1308542400, 1308715200, 1308722400, 1308844800, 1309575600, 1309730400
), class = c("POSIXct", "POSIXt"))), .Names = c("id", "time"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")

Solution

If I am understanding what you want, you need to make additional columns for the day of the week (so you can identify that) and for the week of the year (so that you can end up with separate columns for each). Using the data that you gave a dput() for:

data$day.of.week <- format(data$time, "%A")
data$week.of.year <- format(data$time, "%U")

Effectively now you want to reshape the data, so using the reshape2 package (not the only way, but the one I am most familiar with)

library("reshape2")

dcast(data[data$day.of.week=="Friday",], id~week.of.year, 
    value_var="time", fun.aggregate=length)

In that example, I subsetted the data to just get the Fridays. If you wanted to do all the days, but each day separately, the plyr package can help with that iterating.

library("plyr")

dlply(data, .(day.of.week), dcast, id~week.of.year, 
    value_var="time", fun.aggregate=length)

The results of these two are:

> dcast(data[data$day.of.week=="Friday",], id~week.of.year, value_var="time", fun.aggregate=length)
  id 18 24 26
1  1  0  0  1
2  2  0  1  0
3  4  1  0  0

> dlply(data, .(day.of.week), dcast, id~week.of.year, value_var="time", fun.aggregate=length)
$Friday
  id 18 24 26
1  1  0  0  1
2  2  0  1  0
3  4  1  0  0

$Monday
  id 17
1  1  1

$Saturday
  id 19
1  2  1

$Sunday
  id 19 20 25 27
1  1  0  0  1  0
2  3  0  1  0  0
3  5  1  0  0  0
4  7  0  0  0  1

$Thursday
  id 17 19 21 24 25
1  1  0  1  1  0  0
2  2  1  0  1  0  1
3  3  0  0  0  1  0
4  4  0  0  0  1  0

$Tuesday
  id 18 25
1  3  1  1
2  4  0  1

$Wednesday
  id 20
1  3  1

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  day.of.week
1      Friday
2      Monday
3    Saturday
4      Sunday
5    Thursday
6     Tuesday
7   Wednesday

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow