What's the best algorithm for finding the sum of all items in an arbitrary sub array

https://stackoverflow.com/questions/6088503

09-09-2020
|

Question

I have a problem, and an OK-ish solution. I'm hoping there's a better solution out there.

The problem

I have an array with around 200,000 integers. Given two indices, i1 and i2, I need to calculate the sum of all elements between i1 and i2. Each integer in the array is between 1 and 4 inclusive. For example:

a = [1, 3, 2, 4, 3, 2, 4, 1];
subsection_sum(a, 0, 3); // returns 6: (1 + 3 + 2)

This operation will be performed around 200,000 times, so needs to be pretty fast. A simple counter in a for loop is O(n), and far too slow. The array is never modified after construction, so it's OK to have a relatively expensive pre-processing stage.

My best solution so far

This algorithm works in O(log n) time:

First pad the original array with zeros until its length is a power of two. Next, split the array into two equal parts and store the sum of each. Then split the array into quarters and store the sum of each. Then eighths. Continue doing this until the array is split into sections 2 elements long. For the 8-element array above, this takes two steps:

halves = [(a[0] + a[1] + a[2] + a[3]), (a[4] + a[5] + a[6] + a[7])]
quarters = [(a[0] + a[1]), (a[2] + a[3]), (a[4] + a[5]), (a[6] + a[7])]

Then given two indices, it is now possible to work out the subsection_sum in O(log n) time. For example, subsection_sum(a, 2, 7) == quarters[1] + halves[1].

Solution

Introduce an auxiliary array that contains the cumulative sum. That is, element i of the the auxiliary array has the sum of elements 0 through i of the original array. The subarray sum is then just the difference of two elements from the auxiliary array. This will give a result in constant time, O(1).

This depends on an invariant in the subsection_sum function given in the question,:

subsection_sum(a, 0, i2) = subsection_sum(a, 0, i1) + subsection_sum(a, i1, i2)

where I'm assuming i1 <= i2. Rearranging, we have:

subsection_sum(a, i1, i2) = subsection_sum(a, 0, i2) - subsection_sum(a, 0, i1)

Note that the sums on the right-hand side both start from 0. The auxiliary array can be viewed as caching the values for the sums from zero, subsection_sum(a, 0, i), for all i.

OTHER TIPS

If you can afford O(n) additional storage, you can create a lookup table whose ith element is the sum of the elements at indices 0 through i (inclusive) in the input array. In pseudocode:

def computeLookupTable(arr):
    let n = arr.length
    let lookupTable = new Array()

    lookupTable[0] = arr[0]

    for i=1 to n:
        lookupTable[i] = arr[i] + lookupTable[i-1]

    return lookupTable

Then you can use this table to compute the sum of all elements in array between i1 and i2 by taking the difference

lookupTable[i2] - lookupTable[i1]

which takes constant time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow