Counting number of comparisons and analyzing the efficiency of a particular algorithm using programming and/or mathematics

StackOverflow https://stackoverflow.com/questions/19122201

Question

def three_way_merge(L1,L2,L3):
    L = []
    i1 = 0
    i2 = 0
    i3 = 0
    done1 = False
    done2 = False
    done3 = False
    while not (done1 and done2 and done3):
        if not done1 and (done2 or L1[i1] < L2[i2]) and (done3 or L1[i1] < L3[i3]):
            L.append(L1[i1])
            i1 += 1
            done1 = i1 >= len(L1)
        elif not done2 and (done3 or L2[i2] < L3[i3]):
            L.append(L2[i2])
            i2 += 1
            done2 = i2 >= len(L2)
        else:
            L.append(L3[i3])
            i3 += 1
            done3 = i3 >= len(L3)
    return L

I want to count the worst possible number of comparisons for this algorithm I found, because I have an exam coming up in my algorithms class and I wish to be able to do this kind of analysis. My thought was to write a program that creates many random examples of this "worst case" (which I am guessing is something of the type: L1 = [9,10,11], L2 = [6,7,8], L3 = [3,4,5], where all the lists are sorted but L3 and L2 have strictly smaller values than L1, etc.) and then every time I do any comparison I increment a counter and return the final count, and then try to figure out some kind of pattern in the outputs, but this seems to be an inefficient way to go about this.

Is there a way to count this in a similar fashion to the analysis of the classic merge in merge sort?

Was it helpful?

Solution

As a general rule, generating random input is not a good way to figure out worst-case running time. For example, quicksort runs in O(n log n) on average, but in the worst case it runs in O(n^2). However, even if you generated a huge number of random samples, for moderately large n you would never come anywhere close to the worst case. Instead, try and construct a worst-case input manually.

In this case, it seems that the worst case, assuming that each array has length N, occurs if

L1 = (N,2N,2N+1,...,3N-3,3N)
L2 = (N+1,N+2,...,2N-1,3N-1)
L3 = (1,2,...,N-1,3N-2)

To see why, trace the execution of the algorithm. The first thing that happens is that the first N-1 elements of L3 will get added to L. Each of these iterations of the loop will have 3 comparisons: two in the first if statement and one in the second. Note that we need L1[1]<L2[1] otherwise it will skip the second comparison in the first if

Next will be the element L[1]=N, which takes one comparison only.

After this come the first N-1 elements of L[2], each of which will require two comparisons, one to L1 and one to L3.

Next come the next N-2 elements from L1, with one comparison each.

At this point there is only one element left in each list. L3 gets picked first, with 3 comparisons, and then one comparison for L2, and that's it.

The total is

(N-1)*(3+2+1)+3+1 = 6N - 2

I think this is the worst case, but you might be able to squeeze one more out of it somewhere. Also, I may have made a mistake, in which case somebody here will probably catch it. The next thing you should do is try to actually prove that this is the worst-case running time.

PS This algorithm is not optimal for merging three lists. Picking the smallest element from the front of the three lists should only require 2 comparisons at most, not 3. If you find that L2<L1 and L1<L3 then it's not necessary to compare L2 and L3 since you already know that L2 is smaller.

On edit: it shouldn't be too hard to prove that this is actually the worst case. Assuming none of the lists are empty, the number of comparisons per iteration is:

  • 3 if L3 is smallest and L1 < L2
  • 2 if L2 is smallest
  • 1 if L1 is smallest

That right there gives you an upper bound of N*6, since each list can only be the smallest N times. So completing a proof just requires examining what happens at the end where the lists become empty.

OTHER TIPS

As you said the worst scenario is to have the L3 (or L2) with all smaller numbers than L1, because the IF clause will fail and it will perform elif section computing more comparations.

Inside the first IF (and assuming we will count as an individual comparation each checking of boolean values, like done1, done2, etc.) and having into account that logical expressions ussually are computed in a lazy way, then the worst case is to never reach done1 = true before the others (that is guaranteed as L1 has bigger values than L2 and L3), done2 neither reach true (can be guaranteed having bigger values in L2 than in L3) so the L1[i1] < L2[i2] is computed in every steps.

When L3 is finished, and each cycle enters in the IF section and only 4 comparations are performed because done3 is true, and thanks to the lazyness the last comparation is not computed. The same applies when entering the elif section only 2 comparations are performed.

When L2 is finished, only 3 comparations are perfomed in the IF clause (as done2 and done3 are true)

So, having this configuration (L1 >> L2 >> L3) this algorithm will perform:

Len(L3) * (3 (the while clause) + 5 (the IF clause) + 3 (the elif section) + 1 (the done3 calulation)) + Len(L2) * (3 (the while clause) + 4 (the IF clause) + 2 (the elif section) + 1 (the done2 calulation)) + Len(L1) * (3 (the while clause) + 3 (the IF clause) + 1 (the done1 calulation))

so the final count is

Len(L3) * 12 + Len(L2) * 10 + Len(L1) * 7

The computational Order is the same in any case of ordering of the 3 arrays, the Order is Len(3) + Len(2) + Len(1)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top