Merchandising Hub

Experimentation Hub

Work with experiences

Build segments

Collect visitor sentiment

We'll introduce some terms and concepts that you will encounter when working with experiences and experience test results.

A, B, C, E, F, G, I, L, O, P, R, S, T, U, V, W

**A/B testing**- On a simple level, A/B testing is a method of experimenting with two versions of a website, a control and a variation. By observing and analyzing the behavior of visitors that are randomly bucketed into either the control or a variation, we avoid most effects that might bias the data. We can then answer questions about which version performed better against defined goals such as Conversion Rate or click-through rate on a banner**Average Order Value (AOV)**. Not the same as RPC, since RPC is cumulative for a visitor, whereas Average order value (AOV) is measured across single orders. AOV is important because it provides a valuable insight how much your customers are spending on your products

**Bad allocation / traffic-split**- Our stats model continuously monitors how many visitors are going into each variation. If the split goes outside a statistically realistic range, the test throws up an error message - this almost always signifies an issue with the data

**Chance of an uplift**- The probability of uplift**Confidence**- The amount of uncertainty associated with an uplift estimate . It is the chance that the confidence interval (margin of error around the estimate) will contain the true value that you are trying to estimate. A higher confidence level requires a larger sample size**Control**- One of an experiment's variations, where no treatment is applied, i.e. we don't show the banner we are testing. By exposing visitors to a control, we have an effective means of testing your experiment. Visitors bucketed into your experience control will see your website, mobile platform, or mobile app without any changes. The control is used as a basis of comparison**Conversions**- The number of purchases where your property is transaction based in an iteration**Conversion Rate (CR)**- The number of conversions divided by the total number of visitors. When referring specifically to the metric reported for an experience, CR refers to conversions amongst visitors from the moment they enter the experience until the moment they leave or the experience ends. When referring to segment metrics, CR refers to conversions amongst members of a segment that visited a site on a given day. CR is important because it tells you about how customers are engaging with your brand and interacting with your website or mobile appFor a more in-depth discussion of how Qubit calculates Conversion Rate, see What is Conversion Rate?

**Converters**- A visitor who went on to convert. At Qubit when we talk about*Conversion Rate*, we are talking about*converter rate*i.e. the rate of visitors that convert, rather than the rate of sessions that end in a conversion**Customer Lifetime Value (CLV)**- Calculated by multiplying a customer's Average Order Value (AOV) by the average purchase frequency rate, CLV predicts the value that can be attributed to the entire future relationship with a customer. CLV is important not only because it helps identify and segment the most loyal customers it also tells you how well you’re resonating with your customer base, how much your customers like your products or services, and what you’re doing right — as well as how you can improve. It can also help make decisions around how much to invest in your customers

**Experience**- One of Qubit's visual, custom, or programmatic experiences that are used to deliver visual or functional changes to a website, mobile platform, or mobile app**Experiment**- A change delivered to a website, mobile platform, app, etc to make a discovery or test a hypothesis**Experiment completion**- An experiment is considered complete when the primary goal has reached statistical significance

**False positive**- If the default winning threshold is 95%, it is a given that 5% of the time we will be wrong. This is known as a false positive and is defined as an experiment that we thought was beneficial, but wasn’t, and may well be harmful even

**Goal**- A means of determining how an experiment will be evaluated as a success. Each experiment will consist of a primary goal and secondary goals

**Iteration**- A period of time during an A/B test where the test was stopped, changed, and restarted. If an experiment is stopped and restarted without change, the iteration stays the same.**INFO**: Changes that will cause a new iteration to be started include: changing the experience triggers, changing the targeted segments, changing traffic allocation, adding/deleting a variant

**Live visitors**- A live count of the number of visitors that are seeing the experience in real time on your site and the number of visitors that have been served the experience in the last 1 hour

**Outliers**- In revenue testing, there are occasional customers who spend far more than the average. This is a problem for the revenue model because it infers the distribution of revenue from previous data, and can lead to skewed results**INFO**: For each of your experiences you can mitigate the potential for outliers to skew results by ignoring outlier data. This will remove the top 0.1% of spenders from the sample size to prevent the data from outliers interfering with the statistical analysis of an experience

**Pilot test**- A pilot test is a trial run of an A/B test, where the power is set to 20% rather than 80%. It runs a lot faster than a normal test, and is generally used to check that a change does not have a massive negative effect**Power**- The power of a test is essentially how good it is at detecting true uplift—given that a variation provides a real uplift, what is the chance that it will win the test?**INFO**: We would like this to be as high as possible, but the tradeoff is that high powered tests require more data. At Qubit, for a standard test, we have set our parameters so that a variant that has an (actual) 5% uplift has an 80% chance of winning the test.We say a test has ‘reached power’ if the power calculator deems it to have gone on long enough to have the 80% power we aim for

**Power calculator**- Part of the stats engine that estimates how many more converters (or visitors) will be needed to complete the test at 80% power**Primary goal**- All experiences will have a single primary goal. The default primary goal is*Conversions*, which you can change, if necessary. The default goal is important because it determines when an experiment is complete.**An experiment is complete when the primary goal has reached statistical significance****Prior**- Our prior belief on how we believe experiments are distributed for all tests. We use it in the stats model to temper the effect of random fluctuations, especially early on in tests. Without it, the first few thousand visitors would have wildly varying confidence intervals**INFO**: Our prior belief also brings a reality check to extremely large uplifts—you may notice that the expected uplift is not the same as the raw uplift. We are essentially saying that while we believe there is an uplift, we think it likely that some of the uplift came from random fluctuation**Probability**- Probability is a measure of credibility and confidence. In data-driven decision making, probability reflects the decision-making strength in the data. When talking about experiences, and specifically when comparing results between a control and variation in an A/B test, we use probability to indicate how much evidence there is that an observed change is due to the experience itself rather than something else

**Revenue Per Converter (RPC)**- Revenue divided by converters. When referring specifically to the metric reported for an experience, RPC refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPC refers to revenue whilst a member of that segment who visited the site on a given day**Revenue Per Visitor (RPV)**- Revenue Per Converter multiplied by Conversion Rate. When referring specifically to the metric reported for an experience, RPV refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPV refers to revenue whilst a member of that segment who visited the site on a given day

**Sample size**- An amount of data that we require to get a statistically significant result. Can be reduced by changing the winning threshold**WARNING**: Other things being equal, reducing the sample size decreases the amount of time it takes to get a result in an experience. The trade-off is that it also reduces the confidence we have in the result. As our sample size decreases, the confidence in our estimates of uplift also decreases**Secondary goal**- In addition to a primary goal, each experience can have up to 4 additional goals. These ancillary or secondary goals are used in A/B testing to compare experiment variations, but are not used to define whether the experiment is complete or not**Significance**- A test result is statistically significant if it is deemed unlikely to have occurred by statistical error alone**INFO**: Because we use a Bayesian model, the uplift probability is not a significance in the wikipedia sense, but we call it that anyway. Uplift probability is strictly the**probability that the impact of a test is positive under the prior**At Qubit we take the significance to be 95% (the industry standard). This means that for a test to be a winner, we have to determine that the uplift is more than 0 with probability at least 95%

**Statistical significance**- The point in the lifetime of an experiment when we have collected enough data to be certain that the observed change in uplift is due to the experience being shown to visitors and not some unknown factor

**Total converters**- Total number of visitors that saw the experience, either the control or one of the other variations, and converted**INFO**: The metric is derived from the Qubit statistical model and is updated periodically throughout the day. The live count is updated in real time.**Total visitors**- Total number of visitors to your site that saw one of the experience variations, either the control or one of the other variations**INFO**: The metric is derived from Qubit's statistical model and is updated periodically throughout the day. The live count is updated in real time**Traffic allocation**- The proportion of visitors that are put into each variation**INFO**: Typically, we run 50/50 experiments, meaning that half of visitors should see the control, and half of visitors should see the treatment. Other common splits are 80/20 and 95/5 but you can also define a custom allocation**INFO**: When testing multiple variants, we give each variant and even split, so for 3 variants the split is 33/33/33**Treatment/treatment variant**- The treatment (terminology stolen from medicine) is simply the thing that we change on a website, mobile platform, app, etc. So if we want to test adding a welcome message to a website, the treatment is simply the act of displaying a welcome message to a visitor. The variation is the variant of our experiment in which we apply the treatment

**Uplift**- An observed change for a given metric between a variation and the control. A 0% uplift means there is no difference between the variation and the control, and a negative uplift (or*downlift*) means that the control actually did better than a variation**INFO**: At Qubit we express this change as a percentage. So if the control has a Conversion Rate of 4%, and the variation has a Conversion Rate of 5%, we would say this is a 25% percent uplift, since 5 is 125% of 4

**Variation**- One of an experiment's variations, with an applied treatment. Visitors bucketed into your experiences variation will see your website, mobile platform, or mobile app, with the changes delivered in your experience

**Winning threshold**- The default winning threshold for all Qubit Experiences is 95%. This is the standard in web-analytics and denotes our confidence that the observed change in uplift for a given metric is not due to some unknown or random factor**TIP**: By lowering the threshold, you will reduce the required sample size and therefore the time it takes for the experience to complete and get a result. Doing so is therefore often seen as an acceptable method of getting results quicker

Last updated: June 2020

Did you find this article useful?