Debugging: Shuffle deck of cards in Python/ random

Question 1

Generally speaking, you should never believe that a randomization approach works correctly unless you can rigorously prove it works correctly. And that's often hard.

To get some insight into your problem, let's generalize the problematic function:

import random

def deckmaker(n):
    half = n // 2
    deck = []
    for i in range(n):
        if deck.count(0) == half:
            deck.append(1)
        elif deck.count(1) == half:
            deck.append(0)
        elif random.random() > .5:
            deck.append(0)
        else:
            deck.append(1)
    return deck

And here's a little driver:

from collections import Counter
c = Counter()
for i in range(1000):
    c[tuple(deckmaker(2))] += 1
for t in sorted(c):
    print t, c[t]

Running that:

(0, 1) 495
(1, 0) 505

So the two possibilities are about equally likely. Good! Now try a deck of size 4; just change the relevant line like so:

c[tuple(deckmaker(4))] += 1

Running that:

(0, 0, 1, 1) 236
(0, 1, 0, 1) 127
(0, 1, 1, 0) 133
(1, 0, 0, 1) 135
(1, 0, 1, 0) 130
(1, 1, 0, 0) 239

Oops! You could run a formal chi-squared test if you like, but it's dead obvious by inspection that two permutations (the first and the last) are about twice as likely as the other four. So the output isn't even close to being arguably random.

Why is that? Think about it ;-)

Hint

For a deck of size 2*M, what's the chance that the first M entries are all 0? There are two answers to that:

If all permutations of M zeroes and M ones are equal likely, the chance is 1 in (2*M)-choose-M (the number of ways to pick the positions of the M zeroes).
In the way the function constructs a deck, the chance is 1 in 2**M (0 and 1 are equally likely in each of the first M positions).

Generally speaking, (2*M)-choose-M is very much larger than 2**M, so the function constructs a deck starting with all zeroes far more often than "it should". For a deck of 52 cards (M == 26):

>>> from math import factorial as f
>>> one = f(52) // f(26)**2
>>> two = 2**26
>>> float(one) / two
7389761.998476148

So "starts with 26 zeroes" is over 7 million times more likely than it should be. Cool :-)

Doing it "one at a time"

So is it possible to do this correctly picking a 0 or 1 one at a time? Yup! You just need to use the right probabilities: when there are nzero zeroes remaining to be picked, and nremaining total "cards" remaining to be picked, pick zero with probability nzero / nremaining:

def deckmaker(n=52):
    deck = [None] * n
    nremaining = float(n)
    nzero = nremaining / 2.0
    for i in range(n):
        if random.random() < nzero / nremaining:
            deck[i] = 0
            nzero -= 1.0
        else:
            deck[i] = 1
        nremaining -= 1.0
    return deck

Note that there's no need to count. When nzero becomes 0.0, the if test will never succeed (random() < 0.0 can't happen); and once we pick n/2 ones, nzero == nremaining will be true, and the if test will always succeed (random() < 1.0 is always true). It's cute ;-)

Question 2

I am not sure of that is what you mean, but I think the last cards generated by the first method are disproportionally likely to be the same.

The chance that you get 26 ones or 26 zeros out of 50 or fewer samples seems large (I think you could use a commutative binomial distribution to calculate it). Even a small chance would mean you should bet on the last cards being the same (and therefore the others being the opposite) colour.

What you could do is just add zeros and ones until there are 52 cards, and then at the end change random ones into zeros (or the other way) until there are 26 of each.

EDIT: Let's say the deck has 4 cards. The first two colours would be random:

25% (0, 0)
25% (0, 1)
25% (1, 0)
25% (1, 1)

But for (0, 0) and (1, 1), the last two cards are already determined and will be (1, 1) and (0, 0) respectively. The others need one more number to determine the outcome:

25.0% (0, 0, 1, 1)
12.5% (0, 1, 0) > (0, 1, 0, 1)
12.5% (0, 1, 1) > (0, 1, 1, 0)
12.5% (1, 0, 0) > (1, 0, 0, 1)
12.5% (1, 0, 1) > (1, 0, 1, 0)
25.0% (1, 1, 0, 0)

So as you can see, not all options are equally likely. The difference is smaller for 52 cards of course, but the idea remains the same.