Why the maxmin divide and conquer implementation is slower than the others maxmin algorithms?

Question 1

I would be very surprised to ever see Python recursion run faster then Python iteration. Try this implementation of maxmin, taking two values at a time.

def minmax(seq):

    def byTwos(seq):
        # yield values from sequence two at a time
        # if an odd number of values, just return
        # the last value twice (won't hurt minmax
        # evaluation)
        seq = iter(seq)
        while 1:
            last = next(seq)
            yield last,next(seq,last)

    seqByTwos = byTwos(seq)
    # initialize minval and maxval
    a,b = next(seqByTwos,(None,None))
    if a < b:
        minval,maxval = a,b
    else:
        minval,maxval = b,a

    # now walk the rest of the sequence
    for a,b in seqByTwos:
        if a < b:
            if a < minval:
                minval = a
            if b > maxval:
                maxval = b
        else:
            if b < minval:
                minval = b
            if a > maxval:
                maxval = a
    return minval, maxval

If you want to count comparisons, then pass a sequence of objects that implement __lt__ and __gt__, and have those methods update a global counter.

Question 2

As it turns out, there does seem to be a bug in your code or a mistake in your analysis—but it doesn't matter. I'll get to it at the end.

If you look at your results, it seems pretty clear that there's a constant difference of about 5x between the two. That implies that the algorithmic complexity of the second isn't any worse than the first, it's just got a much higher constant multiplier—you're doing the same number of steps, but each one is much more work.

It's possible that this is just an artifact of you testing such a narrow range, only a single factor of 10. But running your tests with a wider range of values, like this:

for i in 100, 1000, 10000, 100000, 1000000, 10000000:
    v = [random.random() for _ in range(i)]
    t1 = timeit.timeit(lambda: maxmin1(v), number=1)
    t2 = timeit.timeit(lambda: maxmin4(v, 0, len(v)), number=1)
    print('{:8}: {:.8f} {:.8f} (x{:.8f})'.format(i, t1, t2, t2/t1))

… you can see that the pattern holds up:

     100: 0.00002003 0.00010014 (x5.00000000)
    1000: 0.00017500 0.00080800 (x4.61716621)
   10000: 0.00172400 0.00821304 (x4.76393307)
  100000: 0.01630187 0.08839488 (x5.42237660)
 1000000: 0.17010999 0.76053309 (x4.47083153)
10000000: 1.77093697 8.32503319 (x4.70092010)

So, why the higher constant overhead in the second version? Well, the first version is just doing a simple for iteration, two comparisons, and 1 assignment for each element. The second is calling functions, building and exploding tuples, doing more comparisons, etc. That's bound to be slower. If you want to know why it's exactly 5x slower (or, actually, 15x slower, if you're doing 2n/3 steps instead of just 2n), you'll need to do some profiling, or at least look at the bytecode. But I don't think it's worth it.

The moral of the story is that there's a reason 2(n-1) and 2n/3-2 are both O(n): When you've got two different complexity classes, like O(n) and O(n**2), that will always make a difference for large n; when you've got two algorithms in the same class, the constants in the implementation (the cost of each step) can easily outweigh the constants in the step count.

Meanwhile, how can we verify the 2n/3-2 analysis? Simple, just add a global counter that you increment once for each call to maxmin4. Here are the expected and actual results:

     100:         65        127
    1000:        665       1023
   10000:       6665      11807
  100000:      66665     131071
 1000000:     666665    1048575
10000000:    6666665   11611391

But this just means you're doing about 2/3rds as many steps instead of about 1/3rd, so the constant cost of each steps is 7.5x rather than 15x. In the end, that doesn't really affect the analysis.

Question 3

Although the divide and conquer approach guarantees the minimum number of compares, the actual complexity of the program depends on the total number of operations performed in the program.

In your case, you do around 4 or 5 operations for about n/2 function calls ( leaf nodes of the binary tree of the function calls), and around 16 operations for the internal nodes ( counting all the assignments, arithmetic operations, comparisons,and tuple constructions). That sums up to around 10n total operations.

In the first program, the total number of operations are essentially 2.x*n (where x depends on the number of assignments performed) .

This, together with the relative simplicity of operations in program 1 over program 2 results in the factor of 5 observed in the two programs.

Also, the number of compares by the divide and conquer algorithm should be 3n/2, and not 2n/3.