Statistical significance is important in getting accurate results on tests, but you do not have to be a statistician or even that well versed in statistics to take advantage of it in your landing page testing or SEM campaigns. In Google AdWords and Bing Ads, the engines will try to get you to setup your campaign to optimize on your ad rotation automatically, but this can lead to performance issues in an account.
Why do Google and Bing do this?
Google and Bing are trying to make money by getting people to click on ads, so they automate the serving for you to generate those clicks. Bing Ads has an optimize for clicks setting and a rotate evenly setting for ad rotation, and Google has those plus it has a rotate for 90 days then optimize for clicks and an optimize for conversions setting. The problem with these “optimize” settings is that they do not run a statistically significant test before biasing the results to 1 of the ads in the ad group, and they are only looking at 1 key performance indicator (KPI) such as click-through-rate (CTR) or conversion rate. Beyond this, the engines do not wait for statistical significance before they start biasing the ad serving.
What this does to your account:
If you have the optimize for clicks setting turned on, and if you have an ad that has a higher CTR than another ad but has a much lower conversion rate, you may be wasting a lot of ad spend due to Google or Bing showing the ad with the higher CTR instead of the much high conversion rate. For example, I audited an SEM account recently that had an ad discussing a “Sale” in the headline with a 30% off offer in the description and the other ad merely mentioned up to 30% off in the description and did not use the word “Sale” in the headline. The result was that the one that said “Sale” had a much higher CTR but a much lower conversion rate. The the client had a much better ROAS on the one that did not mention “Sale” because it was getting irrelevant clicks due to a misleading offer. This particular example is an issue of landing page and ad copy strategy, but it also highlights why “optimize for clicks” can be dangerous. Beyond this, the Google and Bing settings only look at an individual ad group, and there is typically very little data in just 1 ad group in 1 campaign to optimize ads on. This is particularly true when working with highly segmented campaigns that target specific keywords in the ad copy in different ad groups.
How to address these risks:
We use a rotate evenly and indefinitely setting and we conclude our own ad copy tests based on the client’s goals. We measure both CTR and conversion rate for significance in our tests to make decisions. We might find that we have statistical significance on CTR but not conversion rate, or it may be the other way around, or we are significant or insignificant on both. Moreover, we like to test ads across ad groups when possible to get more data. The disclaimer here is that there needs to be limited variables in the test to avoid skewed or misleading results.
When it comes to running ad copy tests on SEM campaigns and display text and banner ad campaigns, it can become challenging to get enough data in an ad group to show statistical significance. For example, if you try to measure the results between 2 ads in 1 ad group that only gets a few clicks per day, it can take weeks or months to have enough data to determine a winner based on CTR or conversion rate. The opportunity cost on your accounts can be huge at this point, so this strategy helps you or your clients save money and maximize profitability.
The main sections of how to appropriately conduct these ad copy tests are as follows:
- How to setup an ad copy test across ad groups, campaigns, and publishers
- How to download the data and then use an Excel tool to bring it all together
- How to measure for statistical significance with an Excel tool and what to do afterward
1. How to setup an ad copy test across ad groups, campaigns, and publishers
Businesses and websites are going to have specific calls to action on their websites that you want to promote in your ad copy, and/or they may have a unique selling proposition (USP) or unique value proposition (UVP) that you want to highlight in your ads. For example, your business may offer free shipping, 24-hour phone support, free consultations, and the list goes on. When setting up your ads for your keywords, you typically can start by trying to capture your keywords in your headlines and then use the description to highlight the call to action and USP. For example, you may have 1 ad with a description that discusses the 25+ years of experience and then offers a free consultation, and then you may have another ad that discusses the client’s service area along with the opportunity to call them for a free evaluation. In either case, the ad can have the same headline but 2 different descriptions. What you end up with is the following in an ad group:
Headline A with Description Line 1 A and Description Line 2 B with Display URL A
Headline A with Description Line 1 C and Description Line 2 D with Display URL A
Description lines A, B, C, and D will be the different USPs and/or offers that you are testing. There are variations and scenarios where maybe only 1 description line differs in the ad, but for this case, we are going to proceed assuming the above example since it does not change the methodology. Similarly, you can later test different headlines using the same descriptions, or you can try some different display URLs, and so on. In any case, always limit the variables in the test so that you can get insight into what types of things work on the account as you develop ad copy in the future. One further point is that this works with display banner ads as well if you run image and call-to-action variations on rotate and then measure the performance of the different sizes against one another.
Moving forward with your test-setup:
Once you have your first ad group setup like this, you can replicate the ads across ad groups within the same campaign as long as you swap out headlines for the keywords you are trying to capture in the other ad groups. In the end, you should have a lot of ad groups that have a straight A/B test setup. I do not recommend running a multivariate test with 3 or 4 ad variations per ad group because it is too hard to limit the variables and have a confident test. It is easier and more efficient to run an A/B, determine winners, and then run another A/B with the winner and the new challenging variable in a round 2 test when you feel it is appropriate. Another point to note is that you should avoid the mentality of always testing ads. If you have a winner that performs well, let it run and get you leads and sales for a while and wait to introduce a challenger ad until there is an actual reason to do so such as diminishing quality scores over time or decreasing conversion rates over time.
In some accounts, you can run this ad copy test across campaigns, but it really depends on the type of business and offerings. Also, if you run a campaign structure that segments keywords by match type in their own campaigns, you can easily run this type of test across a few campaigns. Make sure that you use some sort of labeling or dimensions for your ads to keep it easier to see results. If you do not have dimensions in a platform like Marin or Kenshoo, you can use AdWords labels in AdWords, and you can use the same feature in Bing Ads at some point in 2016 when it finally gets released. In the Excel spreadsheet ad-test-example I setup here for you, you will see the example of how to run the test without the need for labels and dimensions in Excel, but please note that dimensions and labels will save you a lot of time. Also, this spreadsheet contains real performance data even though I changed the ad copy to example text.
2. How to download the data and then use an Excel tool to bring it all together
If you have not downloaded from the link above, here is the spreadsheet link again with the example (ad-test-example). Also, here is another spreadsheet AB-Split-Testing-Significance-Calculator of just the statistical significance tool on its own. I took a great post from this blog and then modified their Excel tool a bit to come up with this one.
You will go into the engines to download your data, or you can use Kenshoo or Marin to download the data all at once if you have them. Because Bing puts description line 1 and 2 together, you may have to run a quick text to columns to break the description into 2 columns on the Bing data, or you can just sort it and then copy the Google descriptions over their respective Bing Ads counterparts while maintaining the metrics data. You will need to ensure your download has columns for Engine, Campaign, Ad Group, Ad headline, Description Line 1, Description Line 2, Display URL, Final URL, Labels, clicks, impressions, cost, and conversions. Once downloaded, use the spreadsheet example to see how you can aggregate the description lines into 1, and keep in mind that you can use this concatenate method for aggregated headlines to descriptions and things like this in other tests.
What do I do with all this data?
The next step is to build a pivot table in Excel that grabs all the data by highlighting all the data and using Insert > Pivot Table. The pivot table in this example sheet shows the appropriate setup for it, and you can start to setup calculated fields for CTR, conversion rate, average CPC, and CPA (also you can do revenue and ROAS if you need it for an Ecommerce client). To see how I did this, go into the Pivot Table tools in Excel and use the Analyze section to go to Fields, Items, & Sets > Calculated Field, and then use the drop-down menu for my formulas. See the screenshots below this for a visual.
Following this, you will be able to see your ad data aggregated across ad groups, campaigns, and publishers, and it gives you the chance to use pivot table filters to see how your ads performed with different segmentation. Sometimes, it will be blatantly obvious that you have a winning ad.
3. How to measure for statistical significance with an Excel tool and what to do afterward
Once you have the data in your pivot tables, you can use the tool referenced above and here again for reference (AB-Split-Testing-Significance-Calculator) to measure the results for statistical significance. I am not going to explain p-value, z score, standard deviation, and general statistical analysis here, but this is a great article to read if you want to learn about it. The tool above is also added as a sheet in the example data with the pivot table in the other file.
In my example, I had a 99% confidence level of a clear winner based on conversion rate, but there was no statistical significance on CTR. The general rule of the spreadsheet is to try and at least aim for 90% confidence, but you can use your own judgment based on the results and there is an 80% confidence check built-in as well. All you have to do is take your aggregated data from the pivot table and put it in the yellow boxes on this tool to get your results.
I hope this helps you, and let me know if you have any questions.