Equivalent to functioning aggregate statement using cast
Question
The aggregate function giving me mean sales by month works fine.
library(chron)
set.seed(42)
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40),
dates = rep(as.Date(seq(from = 14610, to = 14859),
origin = "1970-01-01"),4))
aggregate(sales~months(as.chron(dates)), mean, data=dat)
...and produces the following output:
months(as.chron(dates)) sales
1 Jan 1000.0723
2 Feb 999.1580
3 Mar 995.3055
4 Apr 1000.4912
5 May 1003.9703
6 Jun 997.1086
7 Jul 996.5939
8 Aug 998.5012
9 Sep 1001.3709
My understanding is that the following cast statement should produce the same output:
cast(dat, months(as.chron(dates)) ~ ., mean, value="sales")
but is instead returning the following error:
Error: Casting formula contains variables not found in molten data: months(as.chron(dates))
I'm likely missing something but is it possible to use the chron months() call inside of a cast statement? The following two statements will accomplish the same in cast() but I'm trying to do it in one step and better understand how cast works.
dat$mont <- months(as.chron(dat$dates))
cast(dat, mont ~ ., mean, value="sales")
Thanks in advance, --JT
Solution
This will work with reshape2
library(reshape2)
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales")
## months(as.chron(dates)) NA
## 1 Jan 1004.5404
## 2 Feb 1002.3146
## 3 Mar 996.0883
## 4 Apr 994.1707
## 5 May 1000.4652
## 6 Jun 1002.8020
## 7 Jul 996.0357
## 8 Aug 1001.6754
## 9 Sep 997.6772
or you could use plyr
library(plyr)
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales))
## months sales
## 1 Jan 1004.5404
## 2 Feb 1002.3146
## 3 Mar 996.0883
## 4 Apr 994.1707
## 5 May 1000.4652
## 6 Jun 1002.8020
## 7 Jul 996.0357
## 8 Aug 1001.6754
## 9 Sep 997.6772
or with data.table
library(data.table)
DT <- data.table(dat)
DT[, month := months(as.chron(dates))][,list(sales = mean(sales)),by = month]
## month sales
## 1: Jan 1004.5404
## 2: Feb 1002.3146
## 3: Mar 996.0883
## 4: Apr 994.1707
## 5: May 1000.4652
## 6: Jun 1002.8020
## 7: Jul 996.0357
## 8: Aug 1001.6754
## 9: Sep 997.6772
Comment from Matthew Dowle
The :=
isn't needed, iiuc, as by
accepts expressions directly :
DT[, list(sales=mean(sales)), by=months(as.chron(dates))]
## months sales
## 1: Jan 1004.5404
## 2: Feb 1002.3146
## 3: Mar 996.0883
## 4: Apr 994.1707
## 5: May 1000.4652
## 6: Jun 1002.8020
## 7: Jul 996.0357
## 8: Aug 1001.6754
## 9: Sep 997.6772
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow