I have some (financial time-series) data in a tall format:
require(data.table)
DT <- data.table(Variable=c(rep("a",times = 3), rep("b", times=3)),
Date=as.Date(c("2014-04-01","2014-04-02","2014-04-03"
,"2014-04-02", "2014-04-03","2014-04-04")),
Value=c(1:3,3:1), key=c("Variable","Date"))
DT
Variable Date Value
1: a 2014-04-01 1
2: a 2014-04-02 2
3: a 2014-04-03 3
4: b 2014-04-02 3
5: b 2014-04-03 2
6: b 2014-04-04 1
I would like to calculate a third variable spread
, where spread = a - b
for each common Date
row (basically a spread between two time-series - a common transformation in the financial domain).
Desired output:
Variable Date Value
1: spread 2014-04-02 -1
2: spread 2014-04-03 1
I know of solution when I dcast.data.table
the data into the wide format (i.e. into table with columns c("Date", "a", "b")
), but due to performance issues on large data, is there an elegant way how to do this directly in the tall format using a)dplyr
and b)data.table
(two-part question)?
Ideally on the dplyr
side, I am looking for something as expressive as mutate(tbl_dt(DT, tall=TRUE), spread=a-b)
. (Disclaimer: I am a complete newbie in dplyr
)
Real-life dataset:
# download 200 stocks from Quandl.com. requires free registration
library(Quandl); library(data.table); library(plyr)
ntickers <- 200 ; auth.token="register_free_to_obtain_token"
code.file <- tempfile()
download.file("https://s3.amazonaws.com/quandl-static-content/quandl-stock-code-list.csv",
destfile=code.file)
tickers <- na.omit(read.csv2(code.file, sep=",", stringsAsFactors=FALSE)[,"Price.Code"])
lst <- na.omit(tickers)[1:ntickers]
names(lst) <- lst
Q <- ldply(lst, Quandl,
type = "raw", end_date="2014-04-08",
sort="asc", auth=auth.token) # might take minutes
DT <- as.data.table(Q)[,Date:=as.IDate(Date)]
setnames(DT, ".id", "Instrument")
setkey(DT, Instrument, Date)
> dim(DT); object.size(DT)
[1] 685512 8
41145752 bytes
> DT
Instrument Date Open High Low Close Volume Adjusted Close
1: GOOG/AMEX_ABI 1981-03-11 NA NA 6.56 6.75 217200 NA
2: GOOG/AMEX_ABI 1981-03-12 NA NA 6.66 6.88 616400 NA
3: GOOG/AMEX_ABI 1981-03-13 NA NA 6.81 6.84 462000 NA
4: GOOG/AMEX_ABI 1981-03-16 NA NA 6.81 7.00 306400 NA
5: GOOG/AMEX_ABI 1981-03-17 NA NA 6.88 6.88 925600 NA
---
685508: YAHOO/TSX_AHX_TO 2014-04-02 0.75 0.75 0.75 0.75 5000 0.75
685509: YAHOO/TSX_AHX_TO 2014-04-03 0.79 0.82 0.75 0.82 25700 0.82
685510: YAHOO/TSX_AHX_TO 2014-04-04 0.81 0.81 0.78 0.80 4500 0.80
685511: YAHOO/TSX_AHX_TO 2014-04-07 0.80 1.05 0.80 0.96 40400 0.96
685512: YAHOO/TSX_AHX_TO 2014-04-08 0.95 0.96 0.90 0.95 21300 0.95