Is it possible to do A/B testing by page rather than by individual?

https://stackoverflow.com/questions/2583748

24-09-2019
|

Question

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc).

Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n)

This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page.

Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

Solution

Your intuition is correct. In theory, randomizing by page will work fine. Both treatment groups will have balanced characteristics in expectation.

However, the sample size is quite small so you need to be careful. Simple randomization may create imbalance by chance. The standard solution is to block on pre-treatment characteristics of the shirts. The most important characteristic is your pre-treatment outcome, which I assume is the conversion rate.

There are many ways to create "balanced" randomized designs. For instance, you you could create pairs using optimal matching, and randomize within pairs. A rougher match could be found by ranking pages by their conversion rate in the previous week/month and then creating pairs of neighbors. Or you could combine blocked randomization within Aaron's suggestion: randomize within pairs and then flip the treatment each week.

A second concern, somewhat unrelated, is interaction between treatments. This may be more problematic. It's possible that if a user sees one button on one page and then a different button on a different page, that new button will have a particularly large effect. That is, can you really view treatments as independent? Does the button on one page affect the likelihood of conversion on another? Unfortunately, it probably does, particularly because if you buy a t-shirt on one page, you're probably very unlikely to buy a t-shirt on the other page. I'd worry about this more than the randomization. The standard approach -- randomizing by unique user -- better mimics your final design.

You could always run an experiment to see if you get the same results using these two methods, and then proceed with the simpler one if you do.

OTHER TIPS

You can't.

Lets 50 t-shirts have button A and the remaining 50 have button B. After your test, you realize t-shirts with button A have a better conversion rate.

Now - was the conversion better because of button A, or was it better because the t-shirt designs were really cool and people liked them?

You can't answer that question objectively, so you can't do A/B testing in this manner.

The trouble with your approach is that you're testing two things at the same time.

Say, design x is using button a. Design y is using button b. Design y gets more sales, and more conversions.

Is that because button b gives a better conversion rate than button a, or is that because design y gives a better conversion rate than design x?

If your volume of designs is very high, your volume of users is very low, and your conversions are distributed evenly amongst your designs, I could see your approach being better than the normal fashion - because the risk that the "good" designs clump together and skew your result would be smaller than the risk that the "good" users do. However, in that case you won't have a particularly large sample size of conversions to draw conclusions from - you need a sufficiently high volume of users for AB testing to be worthwhile in the first place.

Instead of changing the sale button for some pages, run all pages with button A for a week and then change to button B for another week. That should give you enough data to see whether the number of sales change significantly between the two buttons.

A week should be short enough that seasonal/weather effect shouldn't apply.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow