See this explanation on Brilliant.org. Basically, five is the smallest possible array we can use to maintain linear time. It is also easy to implement a linear sort with an n=5 sized array. Apologies for the laTex:
Why 5?
The median-of-medians divides a list into sublists of length five to
get an optimal running time. Remember, finding the median of small
lists by brute force (sorting) takes a small amount of time, so the
length of the sublists must be fairly small. However, adjusting the
sublist size to three, for example, does change the running time for
the worse.
If the algorithm divided the list into sublists of length three, pp
would be greater than approximately \frac{n}{3} 3 n elements and
it would be smaller than approximately \frac{n}{3} 3 n elements.
This would cause a worst case \frac{2n}{3} 3 2n recursions,
yielding the recurrence T(n) = T\big( \frac{n}{3}\big) +
T\big(\frac{2n}{3}\big) + O(n),T(n)=T( 3 n )+T( 3 2n )+O(n),
which by the master theorem is O(n \log n),O(nlogn), which is slower
than linear time.
In fact, for any recurrence of the form T(n) \leq T(an) + T(bn) +
cnT(n)≤T(an)+T(bn)+cn, if a + b < 1a+b<1, the recurrence will solve to
O(n)O(n), and if a+b > 1a+b>1, the recurrence is usually equal to
\Omega(n \log n)Ω(nlogn). [3]
The median-of-medians algorithm could use a sublist size greater than
5—for example, 7—and maintain a linear running time. However, we need
to keep the sublist size as small as we can so that sorting the
sublists can be done in what is effectively constant time.