How to optimize rejection sampling

Question 1

Rejection sampling is primarily useful for continuous distributions. What you need is to sample a discrete distribution. Fortunately, this is part of STL in C++11. So, adapted from the sample of std::discrete_distribution:

#include <iostream>
#include <map>
#include <random>

template <typename T>
class sampler
{
    std::vector<T> keys;
    std::discrete_distribution<T> distr;

public:
    sampler(const std::vector<T>& keys, const std::vector<float>& prob) :
        keys(keys), distr(prob.begin(), prob.end()) { }

    T operator()()
    {
        static std::random_device rd;
        static std::mt19937 gen(rd());
        return keys[distr(gen)];
    }
};

int main()
{
    using T = int;
    sampler<T> samp({19, 54, 192, 732}, {.1, .2, .4, .3});
    std::map<T, size_t> hist;

    for (size_t n = 0; n < 10000; ++n)
        ++hist[samp()];

    for (auto i: hist)
    {
        std::cout << i.first << " generated " <<
        i.second << " times" << std::endl;
    }
}

Output:

19 generated 1010 times
54 generated 2028 times
192 generated 3957 times
732 generated 3005 times

Vectors keys and prob contain separately the keys and values (probabilities) of your map. This is because std::discrete_distribution takes into account only the probabilities.

Note that operator() cannot be const because std::discrete_distribution changes state (naturally) at every sample.

Also note that even you implement sampling yourself using the cumulative distribution and binary search (whereby sampling is logarithmic-time in the size of your domain), there are more efficient (constant-time) sampling methods like the alias method. I am not sure what method is used by std::discrete_distribution, however.

Question 2

If you want to bias your sample linearly in proportion to the values, it's easy to do.

Start by calculating the sum of all the values.

Now generate a single random floating-point value between 0 and the sum.

Iterate through the map, summing the values as you go. When the sum is greater than the random value calculated earlier, you've found your sample.

If you'll be doing this repeatedly on an unchanging map, you can create a vector of sums and do a binary search for the random value.

Question 3

One possibility is using a second map (or set) with the not-known-bad keys (you put all the keys there, and once you reject a key because it's larger than the initial random variable, you delete it from the map -- and you search for the key in the not-known-bad set, not in the entire map...