Align dates in R date.table for linear regression

https://stackoverflow.com/questions/23243897

08-07-2023
|

質問

I am having a data.table with returns on n dates for m securities. I would like to do a multiple linear regression in the form of lm(ReturnSec1 ~ ReturnSec2 + ReturnSec3 + ... + ReturnSecM). The problem that I am having is that there might be dates missing for some of the securities and the regression should be on aligned dates. Here is what I have come up with so far:

#The data set
set.seed(1)
dtData <- data.table(SecId = rep(c(1,2,3), each= 4), Date = as.Date(c(1,2,3,5,1,2,4,5,1,2,4,5)), Return = round(rnorm(12),2))

#My solution so far
dtDataAligned <- merge(dtData[SecId == 1,list(Date, Return)], dtData[SecId == 2, list(Date, Return)], by='Date', all=TRUE)
dtDataAligned <- merge(dtDataAligned, dtData[SecId == 3,list(Date, Return)], by='Date',  all=TRUE)
setnames(dtDataAligned, c('Date', 'Sec1', 'Sec2', 'Sec3'))
dtDataAligned[is.na(dtDataAligned)] <- 0

#This is what i want to do
fit <- lm(dtDataAligned[, Sec1] ~ dtDataAligned[, Sec2] + dtDataAligned[, Sec3])

Is there a better (more elegant, possibly faster) way of doing this without having to loop and merge the data.table to perform a regression on the values with aligned dates?

解決

Here is a data.table solution using dcast.data.table, which takes data in the long format (your input) and converts it to the wide format required for the lm call.

lm(`1` ~ ., dcast.data.table(dtData, Date ~ SecId, fill=0))

Here is the output of the dcast call:

         Date     1     2     3
1: 2014-01-02 -0.63  0.33  0.58
2: 2014-01-03  0.18 -0.82 -0.31
3: 2014-01-04 -0.84  0.00  0.00
4: 2014-01-05  0.00  0.49  1.51
5: 2014-01-06  1.60  0.74  0.39

I stole the lm piece from @G.Grothendieck. Note that if you have more than three columns in your real data you will need to specify the value.var parameter for dcast.data.table.

他のヒント

If the question is how to reproduce the output from the code shown in the question in a more compact fashion then try this:

library(zoo)

z <- read.zoo(dtData, split = 1, index = 2)
z0 <- na.fill(z, fill = 0)
lm(`1` ~., z0)

ADDED

Regarding the comment about elegance we could create a magrittr package pipeline out of the above like this:

library(magrittr)

dtData %>% 
    read.zoo(split = 1, index = 2) %>%
    na.fill(fill = 0) %>%
    lm(formula = `1` ~.)

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow