Pregunta

I have a list of weighted objects i.e.:

A->1 B->1 C->3 D->2 E->3

is there an efficient algorithm in C++ to pick random elements according to their weight?

For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)

¿Fue útil?

Solución

As @Dukeling said, we need more info. Like how you interpret and use the selection chance.

At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.

Suppose you start with badness score

B[i] = how badly you don't want to select the i-th item

And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.

As you say, one obvious way is to use multiplicative inverse:

S[i] = 1 / B[i]

However, there might be a little problem with that. The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.

Ask yourself this:

Say
B[1] = 1     ->     S[1] = 1
B[2] = 2     ->     S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2

But with the same amount of change
B[3] = 1000  ->     S[3] = 0.001
B[4] = 1001  ->     S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4

I'll just throw one possible alternative scheme here for now.

S[i] = max(B) - B[i] + 1

The + 1 part helps so no item has zero chance to be selected.

This ends the part of calculating selection score.


Next, let's clear up how to use the selection score in roulette wheel fashion. Assume we decided to use the additive inverse scheme.

B[1] = 1     ->     S[1] = 1001
B[2] = 2     ->     S[2] = 1000
B[3] = 1000  ->     S[3] = 2
B[4] = 1001  ->     S[4] = 1

Then imagine each point in the score is correspond to a lottery ticket. Let's assign the ticket a running IDs.

| Item | Score = #ticket |   ticket ID  |         win chance       |
|   1  |      1001       | 0    to 1000 |  1001/2004 ~ 0.499500998 |
|   2  |      1000       | 1001 to 2000 |  1000/2004 ~ 0.499001996 |
|   3  |         2       | 2001 to 2002 |     2/2004 ~ 0.000998004 |
|   4  |         1       | 2003 to 2003 |     1/2004 ~ 0.000499002 |

There are 2004 tickets in total.

To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004). Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.


For comparison, here is the selection chance in case the multiplicative inverse scheme is used.

| Item |                    win chance         |
|   1  |           1/1.501999001 ~ 0.665779404 |
|   2  |         0.5/1.501999001 ~ 0.332889702 |
|   3  |       0.001/1.501999001 ~ 0.000665779 |
|   4  | 0.000999001/1.501999001 ~ 0.000665114 |

You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.

Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top