Greedy optimization in R

Question 1

Here is an R implementation that is 30% faster than yours. Not as fast as your Rcpp version but maybe it will give you ideas that combined with Rcpp will speed things further. The two main improvements are:

the sapply loop has been replaced by a matrix formulation
the matrix multiplication has been replaced by a recursion

greedOpt <- cmpfun(function(X, Y, iter = 100L){

  N           <- ncol(X)
  weights     <- rep(0L, N)
  pred        <- 0 * X
  sum.weights <- 0L

  while(sum.weights < iter) {

      sum.weights   <- sum.weights + 1L
      pred          <- (pred + X) * (1L / sum.weights)
      errors        <- sqrt(colSums((pred - Y) ^ 2L))
      best          <- which.min(errors)
      weights[best] <- weights[best] + 1L
      pred          <- pred[, best] * sum.weights
  }
  return(weights / sum.weights)
})

Also, I maintain you should try upgrading to the atlas library. You might see significant improvements.

Question 2

I took a shot at writing an Rcpp version of this function:

library(Rcpp)
cppFunction('
  NumericVector greedOptC(NumericMatrix X, NumericVector Y, int iter) {
    int nrow = X.nrow(), ncol = X.ncol();
    NumericVector weights(ncol);
    NumericVector newweights(ncol);
    NumericVector errors(nrow);
    double RMSE;
    double bestRMSE;
    int bestCol;

    for (int i = 0; i < iter; i++) {
      bestRMSE = -1;
      bestCol = 1;
      for (int j = 0; j < ncol; j++) {
        newweights = weights + 0;
        newweights[j] = newweights[j] + 1;
        newweights = newweights/sum(newweights);

        NumericVector pred(nrow);
        for (int k = 0; k < ncol; k++){
          pred = pred + newweights[k] * X( _, k);
        }

        errors = Y - pred;
        RMSE = sqrt(mean(errors*errors));

        if (RMSE < bestRMSE || bestRMSE==-1){
          bestRMSE = RMSE;
          bestCol = j;
        }
      }

      weights[bestCol] = weights[bestCol] + 1;
    }

    weights = weights/sum(weights);
    return weights;
  }
')

It's more than twice as fast as the R version:

set.seed(42)
X <- matrix(runif(100000*10), ncol=10)
Y <- rnorm(100000)
> system.time(a <- greedOpt(X, Y, 1000))
   user  system elapsed 
  36.19    6.10   42.40 
> system.time(b <- greedOptC(X, Y, 1000))
   user  system elapsed 
  16.50    1.44   18.04
> all.equal(a,b)
[1] TRUE

Not bad, but I was hoping for a bigger speedup when making the leap from R to Rcpp. This is one of the first Rcpp functions I've ever written, so perhaps further optimization is possible.