Pergunta

I want to create a correlation matrix for several stocks going back a few years.

getSymbols(c("AAPL", "FB", "LNKD"))
close<-cbind(Cl(AAPL), Cl(FB), Cl(LNKD))
roc<-ROC(close)

These companies all went public at different times, so I get:

head(close)

         AAPL.Close FB.Close LNKD.Close
2007-01-03      83.80       NA         NA
2007-01-04      85.66       NA         NA
2007-01-05      85.05       NA         NA
2007-01-08      85.47       NA         NA
2007-01-09      92.57       NA         NA
2007-01-10      97.00       NA         NA

and:

tail(close)
           AAPL.Close FB.Close LNKD.Close
2013-11-04     526.75    48.22     223.72
2013-11-05     525.45    50.11     224.54
2013-11-06     520.92    49.12     220.78
2013-11-07     512.49    47.56     211.47
2013-11-08     520.56    47.53     215.17
2013-11-11     519.05    46.20     211.66

so when I:

cor(roc)

I get:

           AAPL.Close FB.Close LNKD.Close
AAPL.Close          1       NA         NA
FB.Close           NA        1         NA
LNKD.Close         NA       NA          1

In this case am I forced to begin the matrix at the date where all three companies have stock return history?

In this case thats:

head(na.omit(close))

           AAPL.Close FB.Close LNKD.Close
2012-05-18     530.38    38.23      99.02
2012-05-21     561.28    34.03      96.84
2012-05-22     556.97    31.00     101.33
2012-05-23     570.56    32.00     103.56
2012-05-24     565.32    33.03      98.80

Now if I expand this idea to a much larger matrix, like SP 500, I want to get rid of NAs in history without taking out whole columns as that messes with matrix. Is there a way to clean up returns data for this to be able to compare returns for cor matrix?

Variants of this question have been asked before without a cogent answer:

Correlation Matrix in "R" returning NA values

Foi útil?

Solução 2

This is really ialm's solution, but sounds like you want

cor(roc, use = 'pairwise.complete.obs')

Outras dicas

It would only make sense to create a correlation matrix for the timeframe where all stocks posted results to avoid distorted findings.

Let's say you have three companies A, B and C and your timehorizon is 2005 to 2009. A & B had their IPO in 2005 and C had its IPO in 2007.

If you now calculate the correlation matrix where you would consider the entire timehorizon of 2005 to 2009 for Corr(A,B), this value would indicate how closely they move together during during the boom and the bust period. Your Corr(A,C) however would only exhibit the behavior in the bust period.

It is quite well observed that stock returns exhibit a much stronger correlation during economic downturn, see Paper: Correlation of financial markets in times of crisis. So, your correlation matrix would exhibit distorted values.

In your place, I'd look at a timehorizon where all stocks should have return figures. If there are still very few gaps inside, I'd consider closing them with a linear approximation na.approx() or a spline approximation na.spline() (part of the zoo-package).

Have a good day.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top