We'll introduce some terms and concepts that you will encounter when working with experiences and experience test results.
A/B testing - On a simple level, A/B testing is a method of experimenting with two versions of a website, a control and a variation. By observing and analyzing the behavior of visitors that are randomly bucketed into either the control or a variation, we avoid most effects that might bias the data. We can then answer questions about which version performed better against defined goals such as Conversion Rate or click-through rate on a banner
Chance of an uplift - The probability of uplift
Confidence - The amount of uncertainty associated with an uplift estimate . It is the chance that the confidence interval (margin of error around the estimate) will contain the true value that you are trying to estimate. A higher confidence level requires a larger sample size
Control - One of an experiment's variations, where no treatment is applied, i.e. we don't show the banner we are testing. By exposing visitors to a control, we have an effective means of testing your experiment. Visitors bucketed into your experience control will see your website, mobile platform, or mobile app without any changes. The control is used as a basis of comparison
Conversions - The number of purchases where your property is transaction based in an iteration
Conversion Rate (CR) - The number of conversions divided by the total number of visitors. When referring specifically to the metric reported for an experience, CR refers to conversions amongst visitors from the moment they enter the experience until the moment they leave or the experience ends. When referring to segment metrics, CR refers to conversions amongst members of a segment that visited a site on a given day. CR is important because it tells you about how customers are engaging with your brand and interacting with your website or mobile app
For a more in-depth discussion of how Qubit calculates Conversion Rate, see What is Conversion Rate?
Converters - A visitor who went on to convert. At Qubit when we talk about Conversion Rate, we are talking about converter rate i.e. the rate of visitors that convert, rather than the rate of sessions that end in a conversion
Experience - One of Qubit's visual, custom, or programmatic experiences that are used to deliver visual or functional changes to a website, mobile platform, or mobile app
Experiment - A change delivered to a website, mobile platform, app, etc to make a discovery or test a hypothesis
Iteration - A period of time during an A/B test where the test was stopped, changed, and restarted. If an experiment is stopped and restarted without change, the iteration stays the same.
INFO: Changes that will cause a new iteration to be started include: changing the experience triggers, changing the targeted segments, changing traffic allocation, adding/deleting a variant
Outliers - In revenue testing, there are occasional customers who spend far more than the average. This is a problem for the revenue model because it infers the distribution of revenue from previous data, and can lead to skewed results
INFO: For each of your experiences you can mitigate the potential for outliers to skew results by ignoring outlier data. This will remove the top 0.1% of spenders from the sample size to prevent the data from outliers interfering with the statistical analysis of an experience
Pilot test - A pilot test is a trial run of an A/B test, where the power is set to 20% rather than 80%. It runs a lot faster than a normal test, and is generally used to check that a change does not have a massive negative effect
Power - The power of a test is essentially how good it is at detecting true uplift—given that a variation provides a real uplift, what is the chance that it will win the test?
INFO: We would like this to be as high as possible, but the tradeoff is that high powered tests require more data. At Qubit, for a standard test, we have set our parameters so that a variant that has an (actual) 5% uplift has an 80% chance of winning the test.
We say a test has ‘reached power’ if the power calculator deems it to have gone on long enough to have the 80% power we aim for
Power calculator - Part of the stats engine that estimates how many more converters (or visitors) will be needed to complete the test at 80% power
Primary goal - All experiences will have a single primary goal. The default primary goal is Conversions, which you can change, if necessary. The default goal is important because it determines when an experiment is complete. An experiment is complete when the primary goal has reached statistical significance
Prior - Our prior belief on how we believe experiments are distributed for all tests. We use it in the stats model to temper the effect of random fluctuations, especially early on in tests. Without it, the first few thousand visitors would have wildly varying confidence intervals
INFO: Our prior belief also brings a reality check to extremely large uplifts—you may notice that the expected uplift is not the same as the raw uplift. We are essentially saying that while we believe there is an uplift, we think it likely that some of the uplift came from random fluctuation
Revenue Per Converter (RPC) - Revenue divided by converters. When referring specifically to the metric reported for an experience, RPC refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPC refers to revenue whilst a member of that segment who visited the site on a given day
Sample size - An amount of data that we require to get a statistically significant result. Can be reduced by changing the winning threshold
WARNING: Other things being equal, reducing the sample size decreases the amount of time it takes to get a result in an experience. The trade-off is that it also reduces the confidence we have in the result. As our sample size decreases, the confidence in our estimates of uplift also decreases
Secondary goal - In addition to a primary goal, each experience can have up to 4 additional goals. These ancillary or secondary goals are used in A/B testing to compare experiment variations, but are not used to define whether the experiment is complete or not
Significance - A test result is statistically significant if it is deemed unlikely to have occurred by statistical error alone
INFO: Because we use a Bayesian model, the uplift probability is not a significance in the wikipedia sense, but we call it that anyway. Uplift probability is strictly the probability that the impact of a test is positive under the prior
At Qubit we take the significance to be 95% (the industry standard). This means that for a test to be a winner, we have to determine that the uplift is more than 0 with probability at least 95%
Total converters - Total number of visitors that saw the experience, either the control or one of the other variations, and converted
INFO: The metric is derived from the Qubit statistical model and is updated periodically throughout the day. The live count is updated in real time.
Total visitors - Total number of visitors to your site that saw one of the experience variations, either the control or one of the other variations
INFO: The metric is derived from Qubit's statistical model and is updated periodically throughout the day. The live count is updated in real time
Traffic allocation - The proportion of visitors that are put into each variation
INFO: Typically, we run 50/50 experiments, meaning that half of visitors should see the control, and half of visitors should see the treatment. Other common splits are 80/20 and 95/5 but you can also define a custom allocation
INFO: When testing multiple variants, we give each variant and even split, so for 3 variants the split is 33/33/33
Uplift - An observed change for a given metric between a variation and the control. A 0% uplift means there is no difference between the variation and the control, and a negative uplift (or downlift) means that the control actually did better than a variation
INFO: At Qubit we express this change as a percentage. So if the control has a Conversion Rate of 4%, and the variation has a Conversion Rate of 5%, we would say this is a 25% percent uplift, since 5 is 125% of 4
Winning threshold - The default winning threshold for all Qubit Experiences is 95%. This is the standard in web-analytics and denotes our confidence that the observed change in uplift for a given metric is not due to some unknown or random factor
TIP: By lowering the threshold, you will reduce the required sample size and therefore the time it takes for the experience to complete and get a result. Doing so is therefore often seen as an acceptable method of getting results quicker