Pergunta

Question: Given a vector, I want to know the minimum of a series of cumulative sums, where each cumulative sum is calculated for an increasing starting index of the vector and a fixed ending index (1:5, 2:5, ..., 5:5). Specifically, I am wondering if this can be calculated w/o using a for() loop, and if there is potentially a term for this algorithm/ calculation. I am working in R.

Context: The vector of interest contains a time series of pressure changes. I want to know of the largest (or smallest) net change in pressure across a range of starting points but with a fixed end point.

Details + Example:

#Example R code    
diffP <- c(0, -1,  0,  1,  0,  0,  1,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0)
minNet1 <- min(cumsum(diffP))
minNet1 #over the whole vector, the "biggest net drop" (largest magnitude with negative sign) is -1.
#However, if I started a cumulative sum in the second half of diffP, I would get a net pressure change of -2.
hold <- list()
nDiff <- length(diffP)
for(j in 1:nDiff){
   hold[[j]] <- cumsum(diffP[j:nDiff])
}
answer <- min(unlist(hold)) #this gives the answer that I ultimately want

Hopefully my example above has helped to articulate my question. answer contains the correct answer, but I'd rather do this without a for() loop in R. Is there a better way to do this calculation, or maybe a name I can put to it?

Foi útil?

Solução

This is known as the http://en.wikipedia.org/wiki/Maximum_subarray_problem and is a typical interview question!

Most people --me included-- would solve it using a O(n^2) algorithm but there is in fact a much better algorithm with O(n) complexity. Here is an R implementation of Kadane's algorithm from the link above:

max_subarray <- function(A) {
   max_ending_here <- 0
   max_so_far <- 0
   for (x in A) {
      max_ending_here <- max(0, max_ending_here + x)
      max_so_far <- max(max_so_far, max_ending_here)
   }
   max_so_far
}

Since in your case, you are looking for the minimum sub-array sum, you would have to call it like this:

-max_subarray(-diffP)
[1] -2

(Or you can also rewrite the function above and replace max with min everywhere.)

Note that, yes, the implementation still uses a for loop, but the complexity of the algorithm being O(n) (meaning the number of operations is of the same order as length(diff)), it should be rather quick. Also, it won't consume any memory since it only stores and updates a couple variables.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top