The blinding science of A/B testing

Let’s suppose for a moment that after reading my last blog post, you’ve decided that your website (or pay per click ads) would deliver more value if you did some A/B testing to decide what worked best. Great. Now all you have to do is figure out what comes next.

If you’ve got a lot of spare time on your hands, you may want to put together a test on your own. But more than likely you’ll want to hire somebody who knows what they’re doing and can conduct the testing while you concentrate on running your business. Regardless, the way you approach your A/B testing will go a long way toward determining the bottom line value of your results.

While there are a few tricks of Internet technology required to make things happen — software to serve up the test ads or pages and to keep track of visitor interactions — A/B testing is really a sort of simplified adventure into the world of statistical science. It’s not scary science, mind you, like trying to isolate the Ebola virus or enriching Uranium. Rather, think of every A/B test as an experiment that can help you make money. But A/B testing will only have value to your decision making process if you keep some important points in mind.

1. Make a plan

When scientists run an experiment they always have a plan, right? I mean, the guys at CERN don’t just go to work in the morning and say “Let’s shoot some junk down the big particle accelerator and see what happens.” You need to decide what you’re going to test, then come up with a plan for the test. The plan needs to include details of exactly what the difference is between your test variations (subject), how you’re going to serve the variations to your online audience (method) and what action you expect them to take (result). The plan doesn’t have to be 10 pages long, but it should provide enough information to keep everybody who’s working on the test pointed in the same direction.

2. Let each test run to conclusion

This is the part where you stick to the plan you just created after reading point #1 above. Make sure your plan allows enough time for the test to collect plenty of data (see point #3 below), then let the test run its course. If Alexander Fleming had cleaned out his petri dishes after the first day of his experiments, he’d never have discovered penicillin. Don’t look at the first day’s data and think you’ve seen enough to make a decision. In fact, don’t look at the data at all until the test is finished. It only takes a little patience; once the A/B test is finished you’ll have plenty of opportunity to do all of the deciding you want.

3. Only act on statistically significant results

The real fun starts once the test has finished and you have all the data to paw through. But before you go all crazy making big changes to your website or your ad, you’ve got to make sure the results of the test actually matter. This is where a bit of math — statistics, in this case — comes in handy. If the difference in response to your test variations is either very small or very large, the decision-making gets pretty easy. It’s the results that fall in the “not so obvious” category (which is what you usually get) that make decisions tougher.

The concept is called “statistical significance“, and it lies at the core of the whole process. In simple terms, once the test is finished you need to be able to answer “yes” to two important questions: 1) Did the test collect enough data, and 2) Is the difference in responses to the test variations large enough that it actually demonstrates anything beyond random chance?

There’s no rule of thumb regarding how much data you need to collect, other than “more is better”. When Gregor Mendel started experimenting with plany hybridization 150 years ago, he spent more than seven years cultivating and testing nearly 30,000 pea plants. You don’t have to get that extreme, but your test does need to generate enough user responses to give the math some value.

Once you’re sure you have enough data, then you can commence with the statistical wizardry. A lot of A/B testing software will do most of the math for you, calculating the important details such as results significance and statistical confidence. Otherwise, you’ll need a half-way decent background in statistics. Or, perhaps, a really good calculator that can help you muddle through your iterations of Pearson’s chi-square test and other statistical arcana.

And, of course, there’s the third option of just hiring a marketing team that knows what they’re doing to begin with.