De manera determinista asignar un identificador en una lista de depósitos ponderados

https://stackoverflow.com/questions/4514659

12-10-2019
|

Pregunta

Me estoy quedando pruebas divididas n en un sitio web. Quiero asignar una ID de número entero uniformemente distribuida a uno de los n cubos, y determinística de modo que el mismo usuario siempre obtiene la misma prueba.

En este punto, sólo puede elegir un índice en la lista de pruebas divididas por modding el identificador de usuario por n. ¿Qué pasa si quiero ponderar ciertas pruebas?

Por ejemplo, cubo # 1/21 se asigna 90% del tiempo y las 20 pruebas restantes se asignan 0,5% del tiempo.

Me siento como si de alguna manera se puede ampliar hasta el tamaño de mi lista y seguir utilizando la técnica de la MOD de lograr esto, pero tener potencialmente enormes listas, temporales en la memoria parece poco elegante.

Solución

If most buckets have distinct sizes, where size is defined as percentage of ids, then you'll have to represent this in memory somehow. Otherwise, how else are you going to know these percentages?

One solution to use is to have let's say 100 virtual buckets, each representing 1% of the ids. Then associate 90 of the virtual buckets to bucket #1/21. Then you can perform a mod 100 and if it falls in the fist 90 virtual buckets, assign the id to bucket #1. You can get the optimal number of virtual buckets by dividing each bucket's percentage by the GCD of all percentages, which in your example is 0.5 (GCD(90, 0.5)).

From your example, there is only one distinct bucket size though. The best solution really depends on what types of arrangements you could have.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow