Question

Does anyone know of an algorithm or data structure relating to selecting items, with a probability of them being selected proportional to some attached value? In other words: http://en.wikipedia.org/wiki/Sampling_%28statistics%29#Probability_proportional_to_size_sampling

The context here is a decentralized reputation system and the attached value is therefore the value of trust one user has in another. In this system all nodes either start as friends which are completely trusted or unknowns which are completely untrusted. This isn't useful by itself in a large P2P network because there will be many more nodes than you have friends and you need to know who to trust in the large group of users that aren't your direct friends, so I've implemented a dynamic trust system in which unknowns can gain trust via friend-of-a-friend relationships.

Every so often each user will select a fixed number (for the sake of speed and bandwidth) of target nodes to recalculate their trust based on how much another selected fixed number of intermediate nodes trust them. The probability of selecting a target node for recalculation will be inversely proportional to its current trust so that unknowns have a good chance of becoming better known. The intermediate nodes will be selected in the same way, except that the probability of selection of an intermediary is proportional to its current trust.

I've written up a simple solution myself but it is rather slow and I'd like to find a C++ library to handle this aspect for me. I have of course done my own search and I managed to find TRSL which I'm digging through right now. Since it seems like a fairly simple and perhaps common problem, I would expect there to be many more C++ libraries I could use for this, so I'm asking this question in the hope that someone here can shed some light on this.

Was it helpful?

Solution

This is what I'd do:

int select(double *weights, int n) {
    // This step only necessary if weights can be arbitrary
    // (we know total = 1.0 for probabilities)
    double total = 0;
    for (int i = 0; i < n; ++i) {
        total += weights[i];
    }

    // Cast RAND_MAX to avoid overflow
    double r = (double) rand() * total / ((double) RAND_MAX + 1);
    total = 0;
    for (int i = 0; i < n; ++i) {
        // Guaranteed to fire before loop exit
        if (total <= r && total + weights[i] > r) {
            return i;
        }

        total += weights[i];
    }
}

You can of course repeat the second loop as many times as you want, choosing a new r each time, to generate multiple samples.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top