Question

I am reading about Prioritized Experience Replay, and can't understand the following:

On page 4, every transition can be selected from the table with its own probability. Here is the cumulative density function (if I understood correctly):

$$P(i) = \frac{p_i}{\sum_k{p_k}} $$

where:

$$p_i = \frac{1}{index In Table}$$

Aterwards, the paper says:

For the rank-based variant, we can approximate the cumulative density function with a piecewise linear function with k segments of equal probability. The segment boundaries can be precomputed (they change only when N or α change). At runtime, we sample a segment, and then sample uniformly among the transitions within it.

My question is, why do we have to approximate the density if it can be achieved with the following:

  1. roll a dice between 1 and N (with a dice that is exponentially more likely to roll a '1' rather than a '2', etc)
  2. select an item from index according to the dice.

In c++ we have std::exponential_distribution [source] so there is no need to approximate anything. ...If we maintain our table sorted in a descending order.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top