Question

I am making a python/django split testing or a/b testing library for my own use. However I don't know how to calculate if my test is statisically significant. I have no knowledge of statistics, so I cannot understand most wikipedia articles, or web pages on this topic, so I'm looking for help from my fellow programmers.

I have a simple experiment with 4 options and I have some data of users using it. Each user is in one of 4 options (for the sake of argument we can call the first on the control). Each user either was a success or failure, so I know the conversion rate of each option. How do I figure out if my test is statisically significant, or if it's all just random.

Essentially my input will be [ (a_yes, a_total), (b_yes, b_total), (c_yes, c_total), (d_yes, d_total)], and it's easy to figure out the conversion rate for each option, even how well each one does compared to a, but how do I figure out the statistical significance of it so I can know if this test is telling me anything, or if I should continue to let it run and collect more data.

I've seen some guides on using some formulæ when there are only 2 options (the traditional a/b test), but I want this library to handle multiple options. Is it even possible to calculate statistical significance with multiple values.

Essentially, I'm trying to do something similar to this http://mixpanel.com/labs/split-test-calculator but in python. Bonus points if there's some python library I can just "pip install …"

Was it helpful?

Solution

I believe what you need is a chi-squared test of independence. For each treatment, you have a yes count and a no count (total - yes). The method is described here, among other places.

Scipy has a function to do the hard work: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html#scipy.stats.chi2_contingency

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top