質問

I am having a data.table with returns on n dates for m securities. I would like to do a multiple linear regression in the form of lm(ReturnSec1 ~ ReturnSec2 + ReturnSec3 + ... + ReturnSecM). The problem that I am having is that there might be dates missing for some of the securities and the regression should be on aligned dates. Here is what I have come up with so far:

#The data set
set.seed(1)
dtData <- data.table(SecId = rep(c(1,2,3), each= 4), Date = as.Date(c(1,2,3,5,1,2,4,5,1,2,4,5)), Return = round(rnorm(12),2))

#My solution so far
dtDataAligned <- merge(dtData[SecId == 1,list(Date, Return)], dtData[SecId == 2, list(Date, Return)], by='Date', all=TRUE)
dtDataAligned <- merge(dtDataAligned, dtData[SecId == 3,list(Date, Return)], by='Date',  all=TRUE)
setnames(dtDataAligned, c('Date', 'Sec1', 'Sec2', 'Sec3'))
dtDataAligned[is.na(dtDataAligned)] <- 0

#This is what i want to do
fit <- lm(dtDataAligned[, Sec1] ~ dtDataAligned[, Sec2] + dtDataAligned[, Sec3])

Is there a better (more elegant, possibly faster) way of doing this without having to loop and merge the data.table to perform a regression on the values with aligned dates?

役に立ちましたか?

解決

Here is a data.table solution using dcast.data.table, which takes data in the long format (your input) and converts it to the wide format required for the lm call.

lm(`1` ~ ., dcast.data.table(dtData, Date ~ SecId, fill=0))

Here is the output of the dcast call:

         Date     1     2     3
1: 2014-01-02 -0.63  0.33  0.58
2: 2014-01-03  0.18 -0.82 -0.31
3: 2014-01-04 -0.84  0.00  0.00
4: 2014-01-05  0.00  0.49  1.51
5: 2014-01-06  1.60  0.74  0.39

I stole the lm piece from @G.Grothendieck. Note that if you have more than three columns in your real data you will need to specify the value.var parameter for dcast.data.table.

他のヒント

If the question is how to reproduce the output from the code shown in the question in a more compact fashion then try this:

library(zoo)

z <- read.zoo(dtData, split = 1, index = 2)
z0 <- na.fill(z, fill = 0)
lm(`1` ~., z0)

ADDED

Regarding the comment about elegance we could create a magrittr package pipeline out of the above like this:

library(magrittr)

dtData %>% 
    read.zoo(split = 1, index = 2) %>%
    na.fill(fill = 0) %>%
    lm(formula = `1` ~.)
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top