Split Testing with Confidence

Background

Constant testing and trials in marketing campaigns are vital for highly optimized and maximum performance. It helps us understand the platforms, audiences, creative, and back-end development that should receive focused attention and budget. By consistently running tests, we can also learn how different testing elements, such as cta button color, subject lines, or button color, might affect key metrics like bounce rate or conversion rates. Additionally, incorporating tools like a sample size calculator ensures the required sample size is reached during any marketing campaign, especially within digital marketing and social media platforms.

The more we test our strategies through trial and error, the higher our confidence in recommending winning campaigns. We can analyze standard campaigns and make observations regarding what changes make the greatest impact; however, in order to be certain our observations are significant enough to validate a change in campaign practices, we suggest submitting campaigns to a laboratory test such as a split, or A/B, test. This data driven approach helps us collect data efficiently and reach statistical significance more effectively. It also helps identify any common mistakes in multivariate testing or landing testing, enabling improved user experience and rate optimization.

A split, or A/B, test places one concept to compete for performance against the same concept with the difference of one variable. Tests can be held on social or digital channels. The winning variable can then be implemented into future campaigns as a best practice. For instance, when comparing versions of creative, the original version can serve as the control version, while the second becomes your variation. This structure allows you to see whether changing a cta button, button color, or even email subject lines can increase conversion and improve conversion rate optimization.

Best practices

Split testing divides your audience into random, non-overlapping groups. Then, two identical ads with one differentiating variable are placed in each audience group. Key to these tests is that they are performed with statistical significance. This helps ensure the insights you gather reflect user behavior accurately and are not the result of random chance. When testing email campaigns, for example, you might look at email subject lines and compare your original version to a new variation to see if you can increase conversion rates.

When claiming that a result has statistical significance, we’re claiming that the result is likely to be attributed to one specific reason. Tests should seek a high degree of statistical significance or a high level of confidence that the results occurred because of the change in variable and not because of chance. By using data informed processes and testing tools (like Google Analytics or other specialized testing tool platforms), you can compare multiple variables across different target audience segments. If you’re using multivariate testing, make sure you’ve taken the time to measure required sample size before running tests.

Performing a high-quality and accurate test requires the following best practices:

A formal and written hypothesis should be formed. Your hypothesis guides the campaign and answers both what will be tested and why. Without answering these questions, the results will be insignificant. The hypothesis will clarify how the results will affect your campaign practices and what, if anything, you plan to change with the new insights. Writing down this hypothesis is also crucial when you later review your testing examples to see if your marketing strategies produced actionable insights.
Split tests should only test one variable at a time. Testable variables include pieces of the ad creative (image, copy, ad type), audiences, placements, or the Call To Action (CTA). Testing more than one variable at a time causes a larger standard of error, which invalidates the results. Common mistakes in split testing often involve trying to measure too many factors at once, leading to inconclusive data.
Audiences to which the variables are being tested should be as random as possible. While you can still use a targeted audience based on your campaign’s objectives, each audience should be split into random, non-overlapping groups. This will help mitigate any error caused by conditions such as the device type they may be using, demographics, or other branded ads they might be seeing. Targeted audiences should not be a part of any other branded campaigns outside of the test. This is especially important for marketing campaigns leveraging social media or email marketing, as it helps isolate user behavior for clear results.
Shoot for a high level of confidence that your test results would be consistent if repeated again under the same conditions. A minimum confidence level of 75% is recommended. However, marketers aiming to truly increase conversion or conversion rate may want to consider going for an even higher threshold in order to ensure robust results.
Gather as much data as possible. The best way to reach a high level of statistical significance is to analyze as much data as possible. A minimum of 200 results should be collected before choosing a winner. This will rely on your campaign objectives and budget. A higher minimum should be considered before making any drastic changes to your campaign best practices. For instance, if you’re using an action cta button for a landing testing scenario, you might need a larger sample size to validate your results thoroughly. Remember that your testing practices only yield valid findings if you collect data comprehensively and remain data driven throughout the process.
Determine one KPI that will determine the clarified winner. While other metrics might be available that inform the course of the campaign, it is important that only one is used to determine next steps, as other results may not be statistically significant. For example, if your key measure is conversion rate, then keep your focus there, even if you observe changes to bounce rate or user behavior. This streamlines your analysis and ensures your marketing campaign decisions are driven by the primary goal.

Who should consider a split test?

Split tests should be considered by brands who run similar recurring campaigns that could benefit from data-backed best practices. These brands have run variations of their tested variable in the past in an uncontrolled environment and are able to form a sound hypothesis. They have processes in place to implement the learnings of the test in future campaigns. Split tests should not be conducted without a written hypothesis. In many cases, email testing with different subject lines or testing email variations can drive more data informed decisions around conversion rate optimization and user experience.

Split tests are recommended for campaigns that will run for longer than one month and have the testing power to capture enough test results in one month. Tests that require more time to gather enough results should consider a larger budget or a different key test metric. It’s important to remember that if you don’t reach statistical significance within your desired timeframe, you may not have tested a large enough sample size or used a sample size calculator to guide your approach. This shortfall might prevent you from confidently determining whether you can truly increase conversion rates or whether your action cta button is indeed the best option for your marketing campaign.

back to insights

October 18, 2022by TA

Topic industry insights

Share

Split Testing with Confidence | ThomasARTS

big idea