R sample probabilities: Default is equal weight; why does specifying equal weights cause different values to be returned?

Question 1

As suggested by Randy, different routines are used by sample.int depending on whether prop is NULL.

In your case, it returns inverse results:

> set.seed(1); sample(c(0,1), size=20, replace=TRUE)
 [1] 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1
> set.seed(1); sample(c(0,1), size=20, replace=TRUE, prob=c(.5,.5))
 [1] 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0

What's going on?

For the former, we hit line src/main/random.c:546:

 for (int i = 0; i < k; i++) iy[i] = (int)(dn * unif_rand() + 1);

This one is simple. unif_rand() returns a value between 0 and 1 (and will never return 1), dn is 2 (the number of elements in x) so iy[i] is set to 1 or 2, depending on whether unif_rand() returns a value < .5 or >= .5 respectively; this is the value chosen from x.

The latter is a bit more complex. Because prob is specified, do_sample calls the function ProbSampleReplace at src/main/random.c:309. Here, the probabilities are sorted into descending order with the function revsort at src/main/sort.c:248. This uses a heap sort on the probabilities, and with a two-element vector of equal probabilities, it reverses the order.

ProbSampleReplace again calls unif_rand() but this time it maps it to the cumulative probabilities computed after flipping the order of the vector, so if unif_rand() returns a value < 0.5 the second value is returned (1 in your example). This is the code that does the mapping of unif_rand() to the values in x:

/* compute the sample */
for (i = 0; i < nans; i++) {
    rU = unif_rand();
    for (j = 0; j < nm1; j++) {
        if (rU <= p[j])
            break;
    }
    ans[i] = perm[j];
}

So with equal probabilities of two elements, setting the probability explicitly to c(0.5, 0.5) will return the inverse of the same call without setting the probabilities. With more than two elements, it's not going to always reverse them, but it won't return the same order.

This also explains why Fernando's suggestion works. The values are close enough to .5 as to not change the results for this example, and the heap sort returns the values in the original order.

This expression returns the same matrix as your first line of code:

do.call(cbind, lapply(c(1:10),function(X) {set.seed(X); sapply(vn, function(Y) sum(sample(x=c(1,0),size=Y,replace=T, prob=c(0.5,0.5))), simplify=TRUE)}))

Here, the order of the entries in x have been reversed to account for the two-element sort of equal values (which swaps the entries).

Of course this is all academic. In practice, permuting the order of equiprobable entries doesn't matter.

Source files and line numbers above refer to R 3.0.2.

Question 2

I updated my comment to an answer. sample uses different c routines for uniform sampling and weighted sampling. Though you are using equal weights, R will call the weighted sampling anyway. To see this, consider

> set.seed(1)
> sample.int(100)
  [1]  27  37  57  89  20  86  97  62  58   6  19  16  61  34  67  43  88  83
 [19]  32  63  75  17  51  10  21  29   1  28  81  25  87  42  70  13  55  44
 [37]  78   7  45  26  50  39  46  82  30  65   2  84  59  36  24  85  22  12
 [55]   4   5  14  23  73  79  99  47  18  95  60  77  41  53   3  69  11  71
 [73]  35  31  40  49  76   9  38  64  80  66   8  91  33  92 100  54  98  94
 [91]  52  74  68  72  93  15  56  48  90  96
> set.seed(1)
> sample.int(100, prob = rep(1/100, 100))
  [1]  28  39  60  93  21  91  96  67  63   7  22  18  71  41  79  51  74   1
 [19]  38  78  94  20  64  12  29  40   2  42  87  35  50  61  52  17  84  69
 [37]  81  10  73  44  85  65  80  54  49  82   4  46  75  68  43  90  36  23
 [55]   8  11  30  55  66  34  97  26  47  31  70  24  53  86   6  95  32  89
 [73]  27  33  56  98  88  25  77 100  37  62  19  15  76  13  59   5  14   9
 [91]  45   3  83  99  72  58  48  57  92  16

Note that the two different sampled sequences.