Friday, August 26, 2011

A Primer on A/B Testing


AUGUST 23, 2011

Go to article

A Primer on A/B Testing
Data is an invaluable tool for web designers who are making decisions about the user experience. A/B tests, or split tests, are one of the easiest ways to measure the effect of different design, content, or functionality. A/B tests allow you to create high-performing user experience elements that you can implement across your site. But it’s important to make sure you reach statistically significant results and avoid red herrings. Let’s talk about how to do that.

What is an A/B test?

In an A/B test, you compare two versions of a page element for a length of time to see which performs better. Users will see one version or the other, and you’ll measure conversions from each set of users. A/B tests help designers compare content such as different headlines, call to action text, or length of body copy. Design and style choices can be tested, too; for example, you could test where to place a sign-in button or how big it should be. A/B tests can even help you measure changes in functionality, such as how and when error messages are shown.
Split testing can also help when you’re making drastic design changes that need to be tempered, such as a homepage redesign. You can pick pieces of the change and test them as you ramp up to the final design, without worrying that a massive change will alienate a user base or cause a large drop in conversions.
Results of A/B tests have lasting impact. It’s important to know which design patterns work best for your users so you can repeat “winning” A/B test results across the site. Whether you learn how users respond to the tone of your content, calls to action, or design layout, you can apply what you learn as you create new content.
Data also plays very well with decision-makers who are not designers. A/B tests can help prevent drops in conversion rate, alienation of a user base, and decreases in revenue; clients appreciate this kind of data. The conversions that you measure could be actual product purchases, clicks on a link, the rate of return visits to the site, account creations, or any other measurable action. Split testing can help your team make decisions based on fact rather than opinion.

Decide what to test

First, you need to decide which page element you would like to test. The differences between A/B versions should be distinct. A small change in color, a minor reordering of words, or negligible changes in functionality may not make good A/B tests, as they would likely not register major differences in the user experience, depending on the size of your user base. The difference between versions should influence conversion rates; and it should be something you’ll learn from for future designs. Great A/B tests could compare:
  • completely different email subject lines,
  • offering a package or bulk deal in one version, or
  • requiring sign-up for one user set and leaving it optional for the other.
Which Test Won offers great inspiration for A/B tests, and includes results as well as the testers’ assessment of why a particular version won. A/B tests should only be done on one variable at a time; if you test more than one difference between versions, it’s impossible to tell how each variable influenced conversions.
At this time, you should also figure out what metric you’ll be comparing between the two versions. A conversion rate is the most-used metric for A/B tests, but there may be other data points you may be interested in. The conversion rate you measure could be the percentage of users who clicked on a button, signed up on a form, or opened an email.

Implement your test

Once you’ve decided on the differences between the A and B versions, you need to set up your A/B test to run on your site. There are many A/B testing tools that you can try, depending upon your medium (website, email), platform (static HTML, dynamic content), or comfort with releasing your site metrics to third-party tools. Which Test Won has a solid list of tools that you can use to create your own A/B tests. You can also create your own home-grown solution. You’ll want to be able to control:
  • the number of visitors who see each version of the test,
  • the difference between each version, and
  • how you measure the effect of each test.
Tracking events with Google Analytics can be helpful if you’re using your own split testing solution. You can set custom variables using Google Analytics that help you track the users that see version A of your test against those who see version B. This may help you decipher additional data beyond your primary conversion rate. For example, did users in different countries have different results than the average user?
To set the custom variables in Google Analytics, add the following line of JavaScript to your page:
_gaq.push(['_setCustomVar',1,'testname','testversion',2]);
There’s more information on creating custom variables in Google’s documentation. The parts of the above that you want to replace are testname, which will be an identifier for the A/B test you’re running, and testversion, which will indicate whether this is version A or version B. Use names that will be intuitive for you. For example, if I were to run a home page experiment to compare short text to long text, on version A I would use:
_gaq.push(['_setCustomVar',1,'Homepage Content Test','Short',2]);
On version B I would use:
_gaq.push(['_setCustomVar',1,'Homepage Content Test','Long',2]);
Collecting this information in Google Analytics will allow you to see more data on the users that see your test than just conversion rate, such as their time on site, number of account creations, and more. To see the these variables in Google Analytics once you start collecting data, go to Visitors > Custom Variables and select the test name that you chose earlier.

Measure the results

After some time (typically a few weeks, depending upon the traffic to the test), check in on the results of your test and compare the conversion rate of each version. Each A/B test should reach statistical significance before you can trust its result. You can find different calculators online to see if you’ve reached a 95% confidence level in your test. Significance is calculated using the total number of users who participated in each version of the test and the number of conversions in each version; too few users or conversions and you’ll need more data before confirming the winner. Usereffect.com’s calculator can help you understand how many more users you’ll need before reaching 95% confidence. Ending a test too early can mean that your “winning” version isn’t actually the best choice, so measure carefully.
The more visitors that see your test, the faster the test will go. It’s important to run A/B tests on high-traffic areas of your site so that you can more quickly reach statistical significance. As you get more practice with split testing, you’ll find that the more visitors who see the test, the easier it will be to reach a 95% confidence level.

A/B test examples

Say I’m a developer for an e-commerce site. As A/B tests are perfect for testing one page element at a time, I created an A/B test to solve a disagreement over whether we wanted to bold a part of a product name in a user’s account. We had a long list of products in the user interface to help users manage their product renewals, and we weren’t sure how easy it was for users to scan. In Version A, the list items appeared with a bolded domain name:
service name, yourdomainname.com
While Version B looked like this:
service name, yourdomainname.com
After reaching enough conversions to reach a 95% confidence level, here were the results:
E-commerce Conversion RatePer Visit Value
Version A26.87%$11.28
Version B23.26%$10.62
Version A was our clear winner, and it helped us to understand that users likely scanned for their domain name in a list of products.
User interaction is another metric to check as you’re creating A/B tests. We compared levels of aggression in content tone in one test, and watched to see how visitor patterns changed.
Version A’s text:
Don’t miss out on becoming a VIP user. Sign up now.
Version B’s text:
Don’t be an idiot; become a VIP!
Bounce rates can be a good A/B test metric to watch for landing pages. As we watched the numbers, the versions’ bounce rates were significantly different:
Bounce Rate
Version A0.05%
Version B0.13%
We naturally wanted to be cautious about too-aggressive text, and the bounce rate indicated that the more aggressive version could be alienating users. Occasionally, you may want to dig more deeply into this data once you’ve reached statistical significance, especially if you have a diverse user base. In another content test, I separated the bounce rate data by country using Google Analytics.
Version A Bounce RateVersion B Bounce Rate
United States13.20%16.50%
Non-US15.64%16.01%
Version B had a more consistent bounce rate between versions, and we realized we needed to do more tests to see why version A was performing so differently for the two user groups.
In addition to design and content tests, you can also run experiments on functionality. We had a button that simply added a product to the user’s cart. In both versions of our A/B test, we used the same button language and style. The only difference between the two versions was that version A’s button added the product to the cart with the one-year price. Version B added it to the cart with the two-year price.
Our goal was to measure the ecommerce conversion rate and average order value between the two versions. We weren’t sure if users who got version B would reduce the number of years in the cart down to one year, or if seeing a higher price in the cart would turn them off and prompt them to abandon the cart. We hoped that we’d earn more revenue with version B, but we needed to test it. After we reached the number of conversions necessary to make the test statistically significant, we found the following:
Average Order ValueE-commerce Conversion Rate
Version A$17.138.33%
Version B$18.619.60%
Version B—the button that added the two-year version of the product to the cart—was the clear winner. We’re able to use this information to create other add-to-cart buttons across the site as well.

Red herrings

Sometimes, your A/B test data will be inconclusive. We recently ran a test on our homepage to determine which content performed better; I was sure that one version would be an absolute winner. However, both versions yielded the same e-commerce conversion rate, pages per visit, and average order value. After running the test for weeks, we realized that we would likely never get significant data to make a change, so we ended the test and moved on to the next one. After a neutral result, you could choose either version to use on your site, but there will be no statistically significant data that indicates one version is “better” than the other.
Remember to not get caught up with your A/B tests; sometimes they just won’t show a difference. Give your tests enough time to make sure you’ve given it your best shot (depending upon the number of visitors who see a page, I like to let tests run for at least three weeks before checking the data). If you think the test may not be successful, end it and try something else.
Keep a running list of the different things you want to test; it’ll help you keep learning new things, and it also serves as an easy way to solve disagreements over design decisions. “I’ll add it to the A/B test list” comes in handy when appeasing decision-makers. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.