A scientific approach to A/B testing — Review

“Digital personalisations are no longer merely an option, they’re a necessity.” — Jim Cramer, host of CNBC’s Mad Money.

9 min readMar 14, 2021

The scientific approach to A/B testing — Review

It’s week four into the Growth Marketing minidegree program by CXL Institute and this week’s topic is all about A/B testing. I’m sure that most marketers would have come across this concept before. In my head, I had defined it as simply testing out two messaging approaches and going with the one that works better. Turns out that it’s much more complex than that.

According to digital experimentation consultant Ton Wesseling of leading optimisation agency Online Dialogue based in Netherlands, he perceives it as a scientific experiment which has led him to create a mastery approach which I’ll take you through in my write up.

The history of A/B testing

Before 1995, the web was a fairly new concept with big browsers coming into the market. From 1995 to 2000, there wasn’t so much A/B testing going on so much as people digging through log files of websites to understand real behaviour taking place, making changes as one goes along, measuring results and comparing. The downside was that external influencers were not the same.

By 2000, the world went into meta description redirects, meta tech redirects and JavaScript redirecting which simply meant when you logged onto a website, you’d wait a couple seconds before being brought to a different URL and voila — version B of the website. No cookies here. So if you re-entered the experiments, it could end up being in 50% of locations which led to the conclusion of being in the wrong variation.

In 2003, analytical tools sprouted in the marketplace such as Offermatica, Optimost, Memetrics. The con? These tools which started using cookies were really expensive enterprise software solutions for experimentation. So for the poor man, this meant utilising redirect script. The good news was this was the beginning of proper randomised control trial.

Then in 2006, Google Optimiser makes its way into the marketplace. Back then, it was free and could do JavaScript redirect — perfect solution for the poor man. But people started getting crafty by injecting codes onto web pages so websites could be optimised.

Come 2010, VWO and Optimizely enters the marketplace introducing drag and drop solutions for A/B testing. This made it easy for marketers to log on, drag and drop something, press start and run experiments. The downside was that marketers were no longer learning to use code editors and variables.

By 2013, people moved towards conducting experiments in apps and websites as well as single page applications which were created using react and angular frameworks which made it harder to test client sites. So another wave of crafty solutions that people came up with included creating frameworks and better testing solutions to help run those experiments.

From 2016 onwards, the quality of the tools became better which overcame the difficulties of needing more development power to run experiments. These tools allowed personalisation, segmentation and AI capabilities. These days, we’re having trouble with ITP and ETP to track users across an experiment and embedding optimisation frameworks on the server level at companies.

The pillars of A/B testing

Ton breaks down his methodology into three main pillars: Planning, Execution and Results.

1) Planning

Echoing Ton’s belief in a structured approach, the first and most important step is planning your A/B testing.

Do you have enough data?

First, you need to ask yourself — are you sure it’s time to run A/B testing? This would largely depend on the stage of your business. You may not have all the conversions that you want and in hindsight, this may end up slowing down your progress. Remember that running A/B tests isn’t the goal, testing your assumptions and improving the amount of conversions is.

Secondly, A/B testing needs to be weaved into the company culture and be approached in a structured manner. Otherwise, you’ll be wasting the time and money that goes into these experiments.

So when should you start?

A method that you can use in order to distinguish between various optimisation phases is ROAR:

Risk: This is the entrepreneurial phase whereby A/B testing is for lack of a better word completely nonsensical. It only becomes useful when you start hitting at least 1,000 conversions per month. So if you’re hitting any less, best you abandon your plans. This is because you may actually encounter problems testing instead.
Optimisation: Ton emphasises the risk of false positives. It’s important that you test things that actually have an impact. You also need to choose what you want to analyse in advance. A big pitfall with an underwhelming result is that you go down a rabbit hole until you find something and end up implementing something insignificant.
Automation: Once you’ve grown to 10,000 conversions per month, you can start automating. At this stage, you should have found ideal locations and basic messages with A/B testing to begin testing algorithms.
Rethink: At some point, you’ll end up looking at what you’ve built and rethinking the way you’ve built it. In short, it’s time to innovate.

Which KPI do you pick?

This would depend on your goal metric be it clicks, behaviour, transactions, revenue per user and potential lifetime value. Once you’ve set your goal, you then go into setting up your overall evaluation criterion which is defined as using short-term metrics that predict long-term value. Look for your success and leading indicators and avoid vanity metrics.

What research do you need to gather insights for your A/B tests?

The best research would be to get insights on user behaviour using the 6V research model (pictured below).

Studying customer behaviour will provide insights in the most important parts of the customer journey, understand basic user behaviour, and provide input for setting your hypotheses.

Setting your hypotheses

This step is crucial in order to get everyone aligned on the problem, proposed solution and predicted outcome. This will save time when it comes to discussing what happens during and after the experiments. In order to setup a concrete hypothesis, it needs to include both psychology and data components.

Prioritising your A/B tests

As much as you’d like to conduct as many tests as possible, this isn’t always wise nor feasible. You’ll need to narrow it down based on priority. There are two models you can use to prioritise your tests:

PIE — Potential x Importance x Ease
ICE — Impact x Confidence x Effort

For A/B testing, the recommended formula is Relevant Hypothesis x Relevant Location x Chance On Effect.

Remember — it isn’t about testing the page. It’s about customer journeys and the right hypothesis, only then do you pick a page to test your hypothesis on.

2) Execution

So now that the planning phase is done, it’s time we go into executing the experiments.

Designing, developing and quality assuring your A/B test

Some do’s and don’ts to bear in mind:

When designing:

Only account for 1 challenger
Don’t feel limited to what it is you want to test, but remember to consider the implementation costs
The change should be visible, scannable and usable
Mirror the design with the hypothesis
Consider the minimum detectable effect — with 1,000+ conversions a month, your test capacity is 20+ tests a year maximum with a statistical power of 80% on predicted uplifts of 15%

When developing:

Don’t use the what you see is what you get code editor
Remember that this is an experiment — if it works, it works
If you can’t make it work within a certain timeframe, don’t be shy to propose design changes
Consider injecting client side code in the default settings
Add measurements and extra analytics events to your code
Consider the pros and cons of coding on the server side

When quality assuring:

Consider quality assuring the device / browser combo — this would depend on risk, trust and maturity in running experiments
Check if other page interactions are still working
Remove the main elements to see if the test still holds

Configure your A/B test in your tool

Pick your tool and configure it accordingly. Bear in mind the consequences that certain options may have so as to set the right parameters for your test. The value that comes with measuring the results of your test with your own analytics tool includes:

Best implemented analytics solution
Control when events get triggered
Easily create extra measurements
Have full control on A/B test outcomes
Analyse detailed behavioural data

Determine the length of your A/B test

Consider the goals that you’re looking to track — weekly visitors, conversion rates, circle data. The length of the test should be determined in weeks as behaviour changes during weekdays and weekends, and even the times during thee days.

Once your test is done, it would be wise for you to consider not to continue the test at risk of data pollution which would skew the outcome and lead to a different set of decisions that may be more detrimental to your business.

Monitor your A/B test

One mustn’t forget to monitor your A/B tests. This will help you identify any issues and problems that you can easily rectify during the test phase. Part and parcel of this process is also being able to pull the plug on certain A/B tests. If something is broken, there is a sample ratio mismatch (SRM) error or you’re losing too much money, chances are the experiment won’t work in your favour so it’s best to stop it and move on.

3) Results

Now that you’ve run your experiments, it’s time to weigh in on the results.

Outcomes of your A/B test

Before analysing your test results, remember to:

Be aware of the test duration
Determine how to isolate the test population i.e. users who have seen the variation
Determine the test goals and how to isolate those users i.e. users who have seen the variation followed by a test goal

When it comes to analysing, keep in mind:

To analyse in the analytics tool — not the test tool
Avoid sampling
Analyse users, not sessions — it’s important to better understand your users behaviour
Analyse users who have converted, not users and total conversion — again, this goes back to better understand user behaviour
Check if the population of users that have seen the test are about the same per variation

If you have an inconclusive result, don’t fret! It just means that you were unable to prove that it is outperforming the default. Nevertheless, the result can still be implemented as chances on having a negative impact can be small.

If you have a significant result, congratulations! Go ahead and implement that ASAP. However, don’t be discouraged if conversions don’t end up going up post implementation. It’s a measured result, so you won’t know the real difference that it makes. What you should undertake is diving into the segments to better understand who caused the effect. You can then use this to input into new hypotheses and experiments moving forward.

Presenting your learnings

Before we go into this, let me first explain the difference between frequentist statistics and Bayesian statistics. Frequentist statistics is advisable when conducting A/B testing but only works in mature companies in comparison to Bayesian statistics. Once the results are in, make a risk assessment in order to calculate the expected risk, expected uplift and effect on revenue. An acceptable probability would depend on two things:

The business’ appetite for risk
Type of test: how invasive is the test and how much resources is required?

In order to make sense of your results, consider how the results add direct value and learn the behaviour of your users. The information that you gather from your testing should be stored under five main tagging labels:

Type of product / service
Customer journey phase — which behavioural bucket are you testing?
User segments — device, channel, visitor type, source, etc.
Template — listing page, product page, shopping cart, check out personal details, etc.
Persuasion technique

Once you’ve gathered the insights, package it to your specific stakeholders such that it’s relevant to them. As a basic mainframe, the information you present would differ as follows:

Team — report as described
Other teams — able to use the database
Researchers — extra insights once per month
Management — business case once per month (which should originate from the database)

It’s important that you create an outcome presentation template that leads to action.

Calculating your business case

So once you’ve run many A/B experiments, it’s time to see what value you can add to the company (in other words, here’s where you can calculate how much money you’re making for the company)

Moving forward, there are three options of which you can only pick one:

Increase budgets — requires more A/B tests (quantity)
Increase knowledge — better A/B tests (quality)
Decrease budgets — less A/B tests (quantity)

For significant results, an example of a business case calculation is as follows:

Extra new customers per week x 52 weeks effective x Average lifetime value

Ultimately the item you need to figure out is how you can maximise your growth within your ROI limit which is calculated as the value of A/B testing for optimisation divided by costs of A/B testing for optimisation. If you’re above, increase your budgets. If you’re below, increase your knowledge. If you’re still below, you can consider decreasing your budget.