Question

The aggregate function giving me mean sales by month works fine.

library(chron)
set.seed(42)
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40),
              dates = rep(as.Date(seq(from = 14610, to = 14859),
                              origin = "1970-01-01"),4))
aggregate(sales~months(as.chron(dates)), mean, data=dat)

...and produces the following output:

months(as.chron(dates))     sales
1                     Jan 1000.0723
2                     Feb  999.1580
3                     Mar  995.3055
4                     Apr 1000.4912
5                     May 1003.9703
6                     Jun  997.1086
7                     Jul  996.5939
8                     Aug  998.5012
9                     Sep 1001.3709

My understanding is that the following cast statement should produce the same output:

cast(dat, months(as.chron(dates)) ~ ., mean, value="sales")

but is instead returning the following error:

Error: Casting formula contains variables not found in molten data: months(as.chron(dates))

I'm likely missing something but is it possible to use the chron months() call inside of a cast statement? The following two statements will accomplish the same in cast() but I'm trying to do it in one step and better understand how cast works.

dat$mont <- months(as.chron(dat$dates))
cast(dat, mont ~ ., mean, value="sales")

Thanks in advance, --JT

Was it helpful?

Solution

This will work with reshape2

library(reshape2)
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales")
##   months(as.chron(dates))        NA
## 1                     Jan 1004.5404
## 2                     Feb 1002.3146
## 3                     Mar  996.0883
## 4                     Apr  994.1707
## 5                     May 1000.4652
## 6                     Jun 1002.8020
## 7                     Jul  996.0357
## 8                     Aug 1001.6754
## 9                     Sep  997.6772

or you could use plyr

library(plyr)
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales))
##  months     sales
## 1   Jan 1004.5404
## 2   Feb 1002.3146
## 3   Mar  996.0883
## 4   Apr  994.1707
## 5   May 1000.4652
## 6   Jun 1002.8020
## 7   Jul  996.0357
## 8   Aug 1001.6754
## 9   Sep  997.6772

or with data.table

library(data.table)
DT <- data.table(dat)
DT[, month := months(as.chron(dates))][,list(sales =  mean(sales)),by = month]
##    month     sales
## 1:   Jan 1004.5404
## 2:   Feb 1002.3146
## 3:   Mar  996.0883
## 4:   Apr  994.1707
## 5:   May 1000.4652
## 6:   Jun 1002.8020
## 7:   Jul  996.0357
## 8:   Aug 1001.6754
## 9:   Sep  997.6772

Comment from Matthew Dowle

The := isn't needed, iiuc, as by accepts expressions directly :

DT[, list(sales=mean(sales)), by=months(as.chron(dates))]
##    months     sales
## 1:    Jan 1004.5404
## 2:    Feb 1002.3146
## 3:    Mar  996.0883
## 4:    Apr  994.1707
## 5:    May 1000.4652
## 6:    Jun 1002.8020
## 7:    Jul  996.0357
## 8:    Aug 1001.6754
## 9:    Sep  997.6772
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top