Pergunta

I want to produce a value from a set, where each value has an associated weight.

Eg:

[(1, 4), (2, 3), (3, 3)]

should give me a 40% chance of picking 1, and a 30% chance of picking 2 or 3 respectively.

Using a cumulative sum table and binary searching on it yields the value I want in O(logN) where N is the number of values in the set.

I was thinking of another approach to this using a huffman tree (as described here) with the combined weights of its children stored on each node.

For the above case, the tree would look like

    (10)
   /    \
(1, 4)  (6)
       /   \
   (2, 3) (3, 3)

The algorithm for generate() would be something like:

  • Generate random number k in range [0, {value of root node})

  • Start at root node

  • If at a leaf node, return value

  • If k >= {value of left child}, go right else go left

  • Repeat from third step.

Now, in the case that the weights are about the same, this will take O(logN), similar to the binary search method.

However, if the weights increase in an exponential manner and the tree essentially becomes a list, the complexity becomes

$$ \sum\limits_{i=1}^N k/2^k $$

Which tends to 2.

I'm having some difficulty coming up with an expression for the average case for generate, but it definitely has an upper bound of O(logN) and a lower bound of O(1). Of course, worst case is O(N).

What's a good approach to figuring this out, and what are the possible pitfalls of this kind of an approach?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top