문제

I would like to get the P&L from a weight vector and a price vector.

data$weight[] <- c(NA,NA,1,NA,NA,NA,0,NA,NA,1,NA,NA,NA,0,NA,NA,1,NA,0,NA,NA,NA)

where 1 means buy and 0 means sell

y <- seq(1:length(data$weight))

I have written :

na_following_zero <- na.locf(c(1,data$weight))[-1]==0 & is.na(data$weight) #Ben Bolker's code
PL <- rep(NA,length(data$weight))
PL[1]=0
for (i in 2:length(data$weight)) {
if (is.na(data$weight[i]) && i<which.max(data$weight==1))  {PL[i]=PL[i-1]}
if (data$weight[i] %in% 1) {PL[i]=PL[i-1]}
if (is.na(data$weight[i]) && i>which.max(data$weight==1) && !na_following_zero[i]) {PL[i]=PL[i-1]+y[i]-y[i-1]}
if (data$weight[i] %in% 0) {PL[i]=PL[i-1]+y[i]-y[i-1]}
if (na_following_zero[i]) {PL[i]=PL[i-1]}
}

expected output :

[1] 0 0 0 1 2 3 4 4 4 4 5 6 7 8 8 8 8 9 10 10 10 10

and it gets the job done but it is incredibly slow. Any ideas on how I could improve it ?

도움이 되었습니까?

해결책

The speed issue is common when trying to transition a for-loop mindset to R which is built to handle similar problems in a vectorized manner. I think we've all been there.

EDIT: In comments, OP pointed out that the weights are actually trade signals, which need to be lagged to be used as concurrent weights. In xts this would be the lag() operator, but with raw vectors we have to do a bit of hanky-panky:

wgts <- c(NA,NA,1,NA,NA,NA,0,NA,NA,1,NA,NA,NA,0,NA,NA,1,NA,0,NA,NA,NA)
wgts2 <- c(0, wgts)
wgts2 <- wgts2[1:length(wgts)]

An easy way to vectorize your specific question is to treat your weights as a column in the same timeframe as your prices and calculate your PnL accordingly. Using y as your price series, we fill your weights series forward:

y <- data.frame(prices=1:length(wgts), weights=na.locf(wgts2))

With the prices and weights matched up, we can calculate the per-observation returns (net changes) and multiply by weight to get PnL:

y$rtn <- c(0, diff(y$prices))
y$PnL <- y$weights * y$rtn
cumsum(y$PnL)

However, note that R has a wealth of great tools for managing financial data, far beyond the capabilities of basic vectors and data.frames. As an approach, the above code is misleading because it answers your question (faster PnL calculation) but tells you nothing about where the language excels. Instead, take a look at the tools available in xts, quantmod, and PerformanceAnalytics.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top