Question

I'm running n split tests on a website. I want to assign an evenly distributed integer user id to one of the n buckets, and deterministically so the same user always gets the same test.

At this point, I can just pick an index in the list of split tests by modding the user id by n. What if I want to weight certain tests?

For example, bucket #1/21 is assigned 90% of the time and the remaining 20 tests are assigned 0.5% of the time.

I feel like I can somehow scale up the size of my list and still use the mod technique to accomplish this, but having potentially huge, temporary lists in memory seems inelegant.

Was it helpful?

Solution

If most buckets have distinct sizes, where size is defined as percentage of ids, then you'll have to represent this in memory somehow. Otherwise, how else are you going to know these percentages?

One solution to use is to have let's say 100 virtual buckets, each representing 1% of the ids. Then associate 90 of the virtual buckets to bucket #1/21. Then you can perform a mod 100 and if it falls in the fist 90 virtual buckets, assign the id to bucket #1. You can get the optimal number of virtual buckets by dividing each bucket's percentage by the GCD of all percentages, which in your example is 0.5 (GCD(90, 0.5)).

From your example, there is only one distinct bucket size though. The best solution really depends on what types of arrangements you could have.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top