As it turns out, there does seem to be a bug in your code or a mistake in your analysis—but it doesn't matter. I'll get to it at the end.
If you look at your results, it seems pretty clear that there's a constant difference of about 5x between the two. That implies that the algorithmic complexity of the second isn't any worse than the first, it's just got a much higher constant multiplier—you're doing the same number of steps, but each one is much more work.
It's possible that this is just an artifact of you testing such a narrow range, only a single factor of 10. But running your tests with a wider range of values, like this:
for i in 100, 1000, 10000, 100000, 1000000, 10000000:
v = [random.random() for _ in range(i)]
t1 = timeit.timeit(lambda: maxmin1(v), number=1)
t2 = timeit.timeit(lambda: maxmin4(v, 0, len(v)), number=1)
print('{:8}: {:.8f} {:.8f} (x{:.8f})'.format(i, t1, t2, t2/t1))
… you can see that the pattern holds up:
100: 0.00002003 0.00010014 (x5.00000000)
1000: 0.00017500 0.00080800 (x4.61716621)
10000: 0.00172400 0.00821304 (x4.76393307)
100000: 0.01630187 0.08839488 (x5.42237660)
1000000: 0.17010999 0.76053309 (x4.47083153)
10000000: 1.77093697 8.32503319 (x4.70092010)
So, why the higher constant overhead in the second version? Well, the first version is just doing a simple for
iteration, two comparisons, and 1 assignment for each element. The second is calling functions, building and exploding tuples, doing more comparisons, etc. That's bound to be slower. If you want to know why it's exactly 5x slower (or, actually, 15x slower, if you're doing 2n/3 steps instead of just 2n), you'll need to do some profiling, or at least look at the bytecode. But I don't think it's worth it.
The moral of the story is that there's a reason 2(n-1) and 2n/3-2 are both O(n): When you've got two different complexity classes, like O(n) and O(n**2), that will always make a difference for large n; when you've got two algorithms in the same class, the constants in the implementation (the cost of each step) can easily outweigh the constants in the step count.
Meanwhile, how can we verify the 2n/3-2 analysis? Simple, just add a global counter that you increment once for each call to maxmin4. Here are the expected and actual results:
100: 65 127
1000: 665 1023
10000: 6665 11807
100000: 66665 131071
1000000: 666665 1048575
10000000: 6666665 11611391
But this just means you're doing about 2/3rds as many steps instead of about 1/3rd, so the constant cost of each steps is 7.5x rather than 15x. In the end, that doesn't really affect the analysis.