Split Testing with Confidence

Background

Constant testing and trials in marketing campaigns are vital for highly optimized and maximum performance. It helps us understand the platforms, audiences, creative, and back-end development that should receive focused attention and budget.

The more we test our strategies through trial and error, the higher our confidence in recommending winning campaigns. We can analyze standard campaigns and make observations regarding what changes make the greatest impact; however, in order to be certain our observations are significant enough to validate a change in campaign practices, we suggest submitting campaigns to a laboratory test such as a split, or A/B, test.

A split, or A/B, test places one concept to compete for performance against the same concept with the difference of one variable. Tests can be held on social or digital channels. The winning variable can then be implemented into future campaigns as a best practice.

Best practices

Split testing divides your audience into random, non-overlapping groups. Then, two identical ads with one differentiating variable are placed in each audience group. Key to these tests is that they are performed with statistical significance.

When claiming that a result has statistical significance, we’re claiming that the result is likely to be attributed to one specific reason. Tests should seek a high degree of statistical significance or a high level of confidence that the results occurred because of the change in variable and not because of chance.

Performing a high-quality and accurate test requires the following best practices:

A formal and written hypothesis should be formed. Your hypothesis guides the campaign and answers both what will be tested and why. Without answering these questions, the results will be insignificant. The hypothesis will clarify how the results will affect your campaign practices and what, if anything, you plan to change with the new insights.
Split tests should only test one variable at a time. Testable variables include pieces of the ad creative (image, copy, ad type), audiences, placements, or the Call To Action (CTA). Testing more than one variable at a time causes a larger standard of error which invalidates the results.
Audiences to which the variables are being tested should be as random as possible. While you can still use a targeted audience based on your campaign’s objectives, each audience should be split into random, non-overlapping groups. This will help mitigate any error caused by conditions such as the device type they may be using, demographics, or other branded ads they might be seeing. Targeted audiences should not be a part of any other branded campaigns outside of the test.
Shoot for a high level of confidence that your test results would be consistent if repeated again under the same conditions. A minimum confidence level of 75% is recommended.
Gather as much data as possible. The best way to reach a high level of statistical significance is to analyze as much data as possible. A minimum of 200 results should be collected before choosing a winner. This will rely on your campaign objectives and budget. A higher minimum should be considered before making any drastic changes to your campaign best practices.
Determine one KPI that will determine the clarified winner. While other metrics might be available that inform the course of the campaign, it is important that only one is used to determine next steps, as other results may not be statistically significant.

Who should consider a split test?

Split tests should be considered by brands who run similar recurring campaigns that could benefit from data-backed best practices. These brands have run variations of their tested variable in the past in an uncontrolled environment and are able to form a sound hypothesis. They have processes in place to implement the learnings of the test in future campaigns. Split tests should not be conducted without a written hypothesis.

Split tests are recommended for campaigns that will run for longer than one month and have the testing power to capture enough test results in a one-month period. Tests that require more time to gather enough results should consider a larger budget or a different key test metric.

back to insights

October 18, 2022by Cambria VandeMerwe

Topic industry insights

Share

a little more light reading

industry insights

Case Study: PacificSource Sweeps the Tellys

The challenge: PacificSource is a not-for-profit, regional health insurance carrier operating in the Northwest—a region with a lot of competition and large national and regional carriers with big …..

Split Testing with Confidence | ThomasARTS

big idea