Here is an R implementation that is 30% faster than yours. Not as fast as your Rcpp version but maybe it will give you ideas that combined with Rcpp will speed things further. The two main improvements are:
- the
sapply
loop has been replaced by a matrix formulation - the matrix multiplication has been replaced by a recursion
greedOpt <- cmpfun(function(X, Y, iter = 100L){
N <- ncol(X)
weights <- rep(0L, N)
pred <- 0 * X
sum.weights <- 0L
while(sum.weights < iter) {
sum.weights <- sum.weights + 1L
pred <- (pred + X) * (1L / sum.weights)
errors <- sqrt(colSums((pred - Y) ^ 2L))
best <- which.min(errors)
weights[best] <- weights[best] + 1L
pred <- pred[, best] * sum.weights
}
return(weights / sum.weights)
})
Also, I maintain you should try upgrading to the atlas library. You might see significant improvements.