Qubit's data model is designed to handle a variety of complications that may occur in A/B testing, for example, changes in Conversion Rate, changes in variation, or changes in audience split.
Qubit uses statistical methods, such as Bayesian prior to make use of all the information from each iteration of the test. Below is an explanation of how complications can occur, and how the Qubit model deals with them.
A/B testing is dependent on accruing a required sample size of visitors to establish ‘Statistical Power’. This determines if enough data has been collected to draw conclusions from the data. Whilst this is not the case, we will not show results against an experience's goals. However we recognize that clients often prefer to review results before significance has been obtained just, to see if the test is having an effect.
Early in the testing process, there is insufficient data to make conclusions. The low volume of data at this stage, tends to lead to more volatile results, where a test can go through large fluctuations.
In A/B testing, it is important to only draw conclusions when sufficient data has been collected, as it is more indicative of the likely performance of the variation in the longer term.
If conclusions are drawn too early, then false results can be obtained. At best, this can result in a waste of development time, and at worst, have a negative impact on conversions. In other words, we should hold the assumption that the effect of an A/B test will be minimal until we have data to prove otherwise.
If at the start of your A/B test (50/50 split), you observe 20 conversions in your control, and 40 in your variant, that would result in a 100% Conversion Rate uplift. As it is so early in the test process and insufficient data has been collected, this was probably a statistical fluctuation and even if there is an uplift, it’s probably much less than 100%.
20 / 100 = 20%
40 / 100 = 40%
Qubit’s testing platform applies something called a Bayesian prior to your test results to calculate a final uplift figure. A Bayesian prior sounds complicated but in this case is basically equivalent to saying, it looks like this happened but I know that’s not very likely so I’d like to check a bit more.
Qubit’s testing platform handles these mental gymnastics for you in a rigorous mathematical fashion. We take our extensive experience conducting thousands of A/B tests and apply a Bayesian prior, to inform you exactly how likely any given uplift is. The uplift displayed to you is therefore a combination of the measured conversions and this prior belief.
The main effect of this is early on in your tests, when there isn’t much data to go on, where we tend to be skeptical of any large uplift. So you may see situations where there are a lot more conversions in your variant but we report only a small effect. This prevents misinterpretation of a statistical fluctuation, which could lead to you implementing something on your website which may have no effect or even worse, a negative effect.
Of course as your test accumulates more data, you can be more certain of the results and gradually the impact of these prior beliefs should disappear. Our model handles this for you too. If a test keeps reporting a 100% uplift, our model will tell you exactly when you should start believing it.
One common use case, when running a personalization as an A/B test, is to prove that the personalization is working prior to making the content available for all visitors. Tests are therefore run as a 50/50 split test to determine success. When a positive result is received, we are able to change the traffic allocated to the successful test to 95%, thus always ensuring a control group in case the results turn south.
In our reporting interface, we have markers to reflect the changes made to the experiment. Changing the traffic allocation of an experiment is one example of what might result in us showing a marker. At Qubit, we call these different phases of an experiment, iterations.
Iterations are very important and enable us to see differences over the course of the experiment. Especially when traffic allocation is changed as part of the experiment, iterations can have a dramatic and unexpected effect on the result of a basic A/B test.
To take a simple example:
10 / 1000 =1%
10 / 100 =10%
10 / 1000 =1%
190 / 1900 =10%
Each variation has exactly the same performance in each iteration, but the performance changes from iteration 1 to 2, perhaps a result of seasonality. The only difference we see, is that in Iteration 2, we have changed the traffic allocation from 50 / 50 to 95 / 5 in favor of the variation.
If we were to simply report this data in the reporting interface the results would be as follows:
20 / 1,100 =1.8%
200 / 2,900 =6.9%
That is, we could be reporting a 279% increase in conversion uplift by doing nothing more than changing traffic allocation, this would be a gross error.
At Qubit, we treat each iteration independently. Our stats engine looks at the performance of the test, at each iteration, to generate an output that fits the overall experiment, whilst avoiding bias as a result of changes in the performance of the test over time.
We calculate an estimated uplift using the information for each iteration. The confidence score that our stats engine reports, is the level of confidence that the test is having an impact on the success metric, with this complication already factored in.
We look to assess the uplift in each iteration, weighing its impact on the overall uplift by the volume of traffic, and always in the wider context of the overall experiment.
Iteration 1 Conversion Rate
Iteration 2 conversion Rate
The Qubit estimated uplift handles a variety of complications that can occur in A/B testing. It uses the information from all iterations of a test effectively to avoid bias and combines them with a Bayesian prior to report the uplift that you see in the Qubit platform.