Question

What significance test should you use for a percentage metric with more than two experiments?

For example,

Version | Clicks | Impressions
A       | 5      | 1,763
B       | 4      | 1,672
C       | 2      | 1,689

How sure are we that verison A really is superior to the other two?

Was it helpful?

Solution

In the past I personally have done a pairwise G-tests between the top and the bottom, multiplying the confidence by a fudge factor of n choose 2 to account for the fact that there are n choose 2 possible pairs that could have been the most extreme. Theoretically this is overly conservative, but it worked for me.

See http://elem.com/~btilly/effective-ab-testing/ for more.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top