The speed problem is only limited to weighted sampling without replacement. Here's your code again, moving the parts unrelated to sample
outside of the loop.
normalized_weights <- w/sum(w)
#No weights
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000)
})
#Weighted, no replacement
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000, prob = normalized_weights)
})
#Weighted with replacement
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000, replace = TRUE, prob = normalized_weights)
})
The big problem is that when you do weighted sampling without replacement, each time you pick a value, the weights need to be recalculated. See ?sample
:
If 'replace' is false, these probabilities are applied sequentially, that is the probability of choosing the next item is proportional to the weights amongst the remaining items.
There may be faster solutions than using sample
(I don't know how well it's been optimized) but it's a fundamentally more computationally intensive task than the unweighted/weighted-with-replacement sampling.