I am having a data.table with returns on n dates for m securities. I would like to do a multiple linear regression in the form of lm(ReturnSec1 ~ ReturnSec2 + ReturnSec3 + ... + ReturnSecM). The problem that I am having is that there might be dates missing for some of the securities and the regression should be on aligned dates. Here is what I have come up with so far:

#The data set
set.seed(1)
dtData <- data.table(SecId = rep(c(1,2,3), each= 4), Date = as.Date(c(1,2,3,5,1,2,4,5,1,2,4,5)), Return = round(rnorm(12),2))

#My solution so far
dtDataAligned <- merge(dtData[SecId == 1,list(Date, Return)], dtData[SecId == 2, list(Date, Return)], by='Date', all=TRUE)
dtDataAligned <- merge(dtDataAligned, dtData[SecId == 3,list(Date, Return)], by='Date',  all=TRUE)
setnames(dtDataAligned, c('Date', 'Sec1', 'Sec2', 'Sec3'))
dtDataAligned[is.na(dtDataAligned)] <- 0

#This is what i want to do
fit <- lm(dtDataAligned[, Sec1] ~ dtDataAligned[, Sec2] + dtDataAligned[, Sec3])

Is there a better (more elegant, possibly faster) way of doing this without having to loop and merge the data.table to perform a regression on the values with aligned dates?

有帮助吗?

解决方案

Here is a data.table solution using dcast.data.table, which takes data in the long format (your input) and converts it to the wide format required for the lm call.

lm(`1` ~ ., dcast.data.table(dtData, Date ~ SecId, fill=0))

Here is the output of the dcast call:

         Date     1     2     3
1: 2014-01-02 -0.63  0.33  0.58
2: 2014-01-03  0.18 -0.82 -0.31
3: 2014-01-04 -0.84  0.00  0.00
4: 2014-01-05  0.00  0.49  1.51
5: 2014-01-06  1.60  0.74  0.39

I stole the lm piece from @G.Grothendieck. Note that if you have more than three columns in your real data you will need to specify the value.var parameter for dcast.data.table.

其他提示

If the question is how to reproduce the output from the code shown in the question in a more compact fashion then try this:

library(zoo)

z <- read.zoo(dtData, split = 1, index = 2)
z0 <- na.fill(z, fill = 0)
lm(`1` ~., z0)

ADDED

Regarding the comment about elegance we could create a magrittr package pipeline out of the above like this:

library(magrittr)

dtData %>% 
    read.zoo(split = 1, index = 2) %>%
    na.fill(fill = 0) %>%
    lm(formula = `1` ~.)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top