Friday, August 26, 2011

A Primer on A/B Testing


AUGUST 23, 2011

Go to article

A Primer on A/B Testing
Data is an invaluable tool for web designers who are making decisions about the user experience. A/B tests, or split tests, are one of the easiest ways to measure the effect of different design, content, or functionality. A/B tests allow you to create high-performing user experience elements that you can implement across your site. But it’s important to make sure you reach statistically significant results and avoid red herrings. Let’s talk about how to do that.

What is an A/B test?

In an A/B test, you compare two versions of a page element for a length of time to see which performs better. Users will see one version or the other, and you’ll measure conversions from each set of users. A/B tests help designers compare content such as different headlines, call to action text, or length of body copy. Design and style choices can be tested, too; for example, you could test where to place a sign-in button or how big it should be. A/B tests can even help you measure changes in functionality, such as how and when error messages are shown.
Split testing can also help when you’re making drastic design changes that need to be tempered, such as a homepage redesign. You can pick pieces of the change and test them as you ramp up to the final design, without worrying that a massive change will alienate a user base or cause a large drop in conversions.
Results of A/B tests have lasting impact. It’s important to know which design patterns work best for your users so you can repeat “winning” A/B test results across the site. Whether you learn how users respond to the tone of your content, calls to action, or design layout, you can apply what you learn as you create new content.
Data also plays very well with decision-makers who are not designers. A/B tests can help prevent drops in conversion rate, alienation of a user base, and decreases in revenue; clients appreciate this kind of data. The conversions that you measure could be actual product purchases, clicks on a link, the rate of return visits to the site, account creations, or any other measurable action. Split testing can help your team make decisions based on fact rather than opinion.

Decide what to test

First, you need to decide which page element you would like to test. The differences between A/B versions should be distinct. A small change in color, a minor reordering of words, or negligible changes in functionality may not make good A/B tests, as they would likely not register major differences in the user experience, depending on the size of your user base. The difference between versions should influence conversion rates; and it should be something you’ll learn from for future designs. Great A/B tests could compare:
  • completely different email subject lines,
  • offering a package or bulk deal in one version, or
  • requiring sign-up for one user set and leaving it optional for the other.
Which Test Won offers great inspiration for A/B tests, and includes results as well as the testers’ assessment of why a particular version won. A/B tests should only be done on one variable at a time; if you test more than one difference between versions, it’s impossible to tell how each variable influenced conversions.
At this time, you should also figure out what metric you’ll be comparing between the two versions. A conversion rate is the most-used metric for A/B tests, but there may be other data points you may be interested in. The conversion rate you measure could be the percentage of users who clicked on a button, signed up on a form, or opened an email.

Implement your test

Once you’ve decided on the differences between the A and B versions, you need to set up your A/B test to run on your site. There are many A/B testing tools that you can try, depending upon your medium (website, email), platform (static HTML, dynamic content), or comfort with releasing your site metrics to third-party tools. Which Test Won has a solid list of tools that you can use to create your own A/B tests. You can also create your own home-grown solution. You’ll want to be able to control:
  • the number of visitors who see each version of the test,
  • the difference between each version, and
  • how you measure the effect of each test.
Tracking events with Google Analytics can be helpful if you’re using your own split testing solution. You can set custom variables using Google Analytics that help you track the users that see version A of your test against those who see version B. This may help you decipher additional data beyond your primary conversion rate. For example, did users in different countries have different results than the average user?
To set the custom variables in Google Analytics, add the following line of JavaScript to your page:
_gaq.push(['_setCustomVar',1,'testname','testversion',2]);
There’s more information on creating custom variables in Google’s documentation. The parts of the above that you want to replace are testname, which will be an identifier for the A/B test you’re running, and testversion, which will indicate whether this is version A or version B. Use names that will be intuitive for you. For example, if I were to run a home page experiment to compare short text to long text, on version A I would use:
_gaq.push(['_setCustomVar',1,'Homepage Content Test','Short',2]);
On version B I would use:
_gaq.push(['_setCustomVar',1,'Homepage Content Test','Long',2]);
Collecting this information in Google Analytics will allow you to see more data on the users that see your test than just conversion rate, such as their time on site, number of account creations, and more. To see the these variables in Google Analytics once you start collecting data, go to Visitors > Custom Variables and select the test name that you chose earlier.

Measure the results

After some time (typically a few weeks, depending upon the traffic to the test), check in on the results of your test and compare the conversion rate of each version. Each A/B test should reach statistical significance before you can trust its result. You can find different calculators online to see if you’ve reached a 95% confidence level in your test. Significance is calculated using the total number of users who participated in each version of the test and the number of conversions in each version; too few users or conversions and you’ll need more data before confirming the winner. Usereffect.com’s calculator can help you understand how many more users you’ll need before reaching 95% confidence. Ending a test too early can mean that your “winning” version isn’t actually the best choice, so measure carefully.
The more visitors that see your test, the faster the test will go. It’s important to run A/B tests on high-traffic areas of your site so that you can more quickly reach statistical significance. As you get more practice with split testing, you’ll find that the more visitors who see the test, the easier it will be to reach a 95% confidence level.

A/B test examples

Say I’m a developer for an e-commerce site. As A/B tests are perfect for testing one page element at a time, I created an A/B test to solve a disagreement over whether we wanted to bold a part of a product name in a user’s account. We had a long list of products in the user interface to help users manage their product renewals, and we weren’t sure how easy it was for users to scan. In Version A, the list items appeared with a bolded domain name:
service name, yourdomainname.com
While Version B looked like this:
service name, yourdomainname.com
After reaching enough conversions to reach a 95% confidence level, here were the results:
E-commerce Conversion RatePer Visit Value
Version A26.87%$11.28
Version B23.26%$10.62
Version A was our clear winner, and it helped us to understand that users likely scanned for their domain name in a list of products.
User interaction is another metric to check as you’re creating A/B tests. We compared levels of aggression in content tone in one test, and watched to see how visitor patterns changed.
Version A’s text:
Don’t miss out on becoming a VIP user. Sign up now.
Version B’s text:
Don’t be an idiot; become a VIP!
Bounce rates can be a good A/B test metric to watch for landing pages. As we watched the numbers, the versions’ bounce rates were significantly different:
Bounce Rate
Version A0.05%
Version B0.13%
We naturally wanted to be cautious about too-aggressive text, and the bounce rate indicated that the more aggressive version could be alienating users. Occasionally, you may want to dig more deeply into this data once you’ve reached statistical significance, especially if you have a diverse user base. In another content test, I separated the bounce rate data by country using Google Analytics.
Version A Bounce RateVersion B Bounce Rate
United States13.20%16.50%
Non-US15.64%16.01%
Version B had a more consistent bounce rate between versions, and we realized we needed to do more tests to see why version A was performing so differently for the two user groups.
In addition to design and content tests, you can also run experiments on functionality. We had a button that simply added a product to the user’s cart. In both versions of our A/B test, we used the same button language and style. The only difference between the two versions was that version A’s button added the product to the cart with the one-year price. Version B added it to the cart with the two-year price.
Our goal was to measure the ecommerce conversion rate and average order value between the two versions. We weren’t sure if users who got version B would reduce the number of years in the cart down to one year, or if seeing a higher price in the cart would turn them off and prompt them to abandon the cart. We hoped that we’d earn more revenue with version B, but we needed to test it. After we reached the number of conversions necessary to make the test statistically significant, we found the following:
Average Order ValueE-commerce Conversion Rate
Version A$17.138.33%
Version B$18.619.60%
Version B—the button that added the two-year version of the product to the cart—was the clear winner. We’re able to use this information to create other add-to-cart buttons across the site as well.

Red herrings

Sometimes, your A/B test data will be inconclusive. We recently ran a test on our homepage to determine which content performed better; I was sure that one version would be an absolute winner. However, both versions yielded the same e-commerce conversion rate, pages per visit, and average order value. After running the test for weeks, we realized that we would likely never get significant data to make a change, so we ended the test and moved on to the next one. After a neutral result, you could choose either version to use on your site, but there will be no statistically significant data that indicates one version is “better” than the other.
Remember to not get caught up with your A/B tests; sometimes they just won’t show a difference. Give your tests enough time to make sure you’ve given it your best shot (depending upon the number of visitors who see a page, I like to let tests run for at least three weeks before checking the data). If you think the test may not be successful, end it and try something else.
Keep a running list of the different things you want to test; it’ll help you keep learning new things, and it also serves as an easy way to solve disagreements over design decisions. “I’ll add it to the A/B test list” comes in handy when appeasing decision-makers. 

Making up Stories: Perception, Language, and the Web


Go to article

Making up Stories: Perception, Language, and the Web
Storytelling is a buzzword with lots of different interpretations. Either the internet is killing stories, or it’s the best thing to happen to them since the printing press.


Stories have been around as long as we have, helping us understand our world and ourselves. We learn and retain information best through stories, because they turn information into more than the sum of its parts. But what makes a story a story, and what does it mean for the digital world we’ve built?


What Dickens knew

Charles Dickens should be the mutton-chopped mascot of the web. He was a social storyteller on every level. 

His plots spoke to and about society, but his formats were social too: he explored new ways of reaching an audience in the way that his work was distributed, and the way he wrote the stories themselves.

 He published most of his novels in serial form, in magazines packed with advertisements and illustration, costing far less than the cost of a hard-bound book—but he also wrote episodically—actually creating the stories as each magazine was published.


There are few writers working today as open to public comment—as skilled at manipulating public sentiment, and as concerned with the advancement of his medium—as Dickens was for his time.

 But his stories have deeper lessons to show us. His formats gave him freedom, but they also forced constraints. Because his stories were chopped up and divided physically, and because of the time that elapsed between various installments, he had to explore new ways to keep readers engaged. 

And just as though he were an oral storyteller recounting adventures over a series of nights, he used language itself to keep readers interested.


In the last chapter of each installment, his sentences grew shorter, more active, and more visual. This made the text dynamic and active, compelling further engagement. Take the last lines of the first installment of David Copperfield (the close of chapter three):
the empty dog-kennel was filled up with a great dog—deep mouthed and black-haired like Him, and he was very angry at the sight of me, and sprang out to get at me.

You hear the dynamism of the words, and feel the suspense of the moment: Dickens was an early master of the call to action. He understood how people respond to language itself, as well as story. And we have the opportunity to do the same each time we tell a story online.


Comprehension: the other side of the story

The reality is that we never perceive a story exactly as it’s composed. As people read, they fill in, flesh out, and fine-tune our stories.

 There are lots of reasons for this—maybe they began reading part of the way through, are only skimming half of what we’re saying, or reading something in a different context than we think we’ve provided. 

Comprehension is the reader’s half of the story. And we create it through two psycholinguistic mechanisms: inference and coherence.


Inference: You infer, I imply

If you say “Jess bought a bikini,” we infer that Jess is a woman because men don’t (usually) wear bikinis. If you say “I dropped my earring in the Seine,” we infer you’re in Paris (France, not Texas).
Not everyone will make the same inferences, but rest assured inferences are being made. This is how we get through the day—imagine if you had to qualify everything you said, as though you were speaking to aliens who had just landed on earth. You wouldn’t make it through one sentence.


Inferences come in three flavours:
  1. Logical inferences made from meanings of words (e.g., a bikini is a two-piece bathing suit for a woman).
  2. Bridging inferences made from relating previous and new information.
  3. Elaborative inferences made from world knowledge (e.g., the Seine is a river that runs through Paris).
Logical and elaborative inferences require some knowledge of our audience. We can’t assume that everyone we’re talking to or writing for can make the same elaborative or logical inferences. But we shouldn’t write for the lowest common denominator either. This is the beauty of storytelling—the exact same information means different things to different people.


As long as the crux of what we’re trying to convey does not rely on inference, we can sit back, relax, and let our readers do with our content what they will.


Bridging: Embracing the mess

Online, bridging is the element of inference we most need to understand. It’s our task to plan how we want people to connect old and new information.


But the web has an intrinsic ability to foster bridging, because it is modular. It allows us to put away the backstory and connect old and new information by linking, rather than repetition.


Only a poorly constructed narrative repeats itself rather than providing connections. When web content doesn’t embrace its modularity, we end up with siloed pages that repeat the same content, footnotes, taglines, and asides over and over and over again.


Our ability to make inferences between modules of content is instinctive—if we provide the right breadcrumbs, our readers will follow our story. 

And as advances in technology give us more places to engage with content, the concept of a page as the single reductive unit of understanding (which is, as we’ll discuss later, a false assumption anyway) is already starting to disappear.
This does not mean that everything will become a tweet, or that our brains will cease to function when we have to concentrate for more than 30 seconds. This means that we have to understand how comprehension balances the other end of the content equation—and stop thinking of our readers as numbers on a dashboard.


Chomsky Chunks™

The basic elements of stories have never been ‘pages.’ The basic elements of stories are linguistic. And naturally, we can’t talk about linguistics without bringing up Chomsky.


Noam Chomsky compartmentalized language into what he called ‘linguistic units.’  This is simply a chunk of language, broken down into parts—so it could be a paragraph, a sentence, a word, noun, or even a sound.


Now, the web compartmentalizes language as well. We cut and paste, tweet, and quote. Language is naturally disseminated in chunks, just as it has always been. There is a natural symmetry between how we speak and how we understand online.


‘But reducing understanding into smaller and smaller chunks makes us attention-deficient automatons!’ we hear you cry.

 Try telling that to babies. When we learn, we move from sound to word, sentence to paragraph. Linguistic units are literally the building blocks of our engagement with the world. And there’s no reason why the increasingly modular nature of web content should diminish our understanding—rather, it may have the capacity to increase it, prompting us to make inferences and create stories ourselves, rather than passively engaging with static texts.
But these modular texts don’t simply create themselves. As the creators of new contexts online, there are lessons for us to learn. When we create content in modules, we need to understand how it may be understood—we have to consider the non-linear story. 

To do that, we need to understand one final piece of comprehension: coherence.

Coherence: bringing it all together

When we infer and bridge different pieces of content together, we consolidate and get meaning. There are four ways to do this:
  1. Referential coherence: working out what is being discussed.
  2. Temporal coherence: working out when what is being discussed is happening.
  3. Locational coherence: working out where what is being discussed is happening.
  4. Causal coherence: working out why what is being discussed is happening

Now, these happen to also be the basic elements of storytelling. Create a setting place and time, (temporal and locational), characters (referential), and motivation (causal).

 So the elements of storytelling are the elements of understanding. That means storytelling isn’t a nice-to-have—it’s essential.

Storytelling in action: shopping online


The narrative of shopping is a meandering path to a decision, no matter what you’re shopping for. In this case, we’re shopping for clothes.

 To illustrate how storytelling works on something as personal as shopping, we both went through a purchase on ASOS.com.

Once upon a time . . . we came upon a homepage

Randall sees Paris and wants to buy striped things. 

Randall: “As a big fan of striped tights and brie, I go straight for ‘Hello, Paris.’ It’s not so much a category of clothing as an association. Parisians are beautiful and sullen (I once saw a woman in Paris sulking while rollerskating). My exploration of ASOS’s clothes is more about the idea of Paris, and the things I associate with it, i.e., elaborative inferences, than about anything ASOS is telling me. The story is mine, and that’s why it works.”
Elizabeth enters through the sale promotion, then searches by brand. 

Elizabeth: “I’ve bought own-brand items from ASOS before and been disappointed by their quality. So I enter via the lowest common denominator—price—and then look to select brands whose quality I can vouch for.
 Without thinking about it, I’m creating a bridging inference between my past experience on the site and my current experience.”
In both cases, most of the story comes from us—the users.

 We’re filling in the gaps—making inferences, whether they’re based on past experience with certain brands, or elaborative associations drawn from our imagination. 

Conversion in this case isn’t about providing the simplest, most direct route from choice to purchase. It’s about allowing the user to create their own coherent narrative—driven by experiences and associations.


Online narrative is about understanding perception. You provide a framework; your readers fill the gaps. If they can create coherence, they’ll convert.

 Our role as storytellers online is the same as it’s always been—to provide the building blocks of the story: the what, when, where, and why of coherence, and the spark that ignites our readers’ imagination. 

6 ways retailers can (and should) test holiday promotions during back to school


Go to article

Be the first to comment | This entry was posted in Research
Online back-to-school and back-to-college shoppers may be shopping around, but they are far from done, according to the latest survey from NRF. Fully two-thirds of online back-to-school and back-to-college shoppers have done half or less (ahem, much less) of their expected shopping to date.
For consumers, it’s likely a matter of finding the best shopping values (and perhaps not jumping the gun on the “wrong” things). Of the purchases that online back-to-school and back-to-college shoppers have made, more than one third of purchases already completed have been influenced by coupons, sales and/or promotions. Those coupons, sales and promotions have also influenced where online school shoppers choose to buy: three out of five online back-to-school shoppers note that coupons influence them to shop at a particular store (slightly more than online back-to-college shoppers), and two out of five credit in-store promotions with similar influence. With not quite two weeks left until Labor Day, online back-to-school and college shoppers expect to do the remainder of their shopping not only online, but also in department stores, clothing stores, discount stores, and office supply stores.
For retailers, it’s an extremely important time of year, second only to Christmas in terms of sales and the amount of inventory they line their stores with. With not a moment to lose, retailers can use this deadline-focused shopping period as a test run for the upcoming holiday season, and should:
  • Signal “value” to customers. Very different from “low price” or “discounted”, value focuses on quality, style, materials, durability, flexibility and other product attributes that tell the customer that this is indeed the right product at the right price.
  • Push back-to-school and back-to-college content to the forefront of the site and through social media, including product videos, expert tips and advice, as well as customer ratings and reviews that speak directly to shoppers.
  • Fine tune paid search strategies to focus on products and categories that are particularly important for school shopping.
  • Revamp email strategies to include a proactive shopping cart remarketing program.
  • Note clearly Labor Day-centric shipping deadlines on both the site and in ad copy.
  • Test shipping offers. As a final chance to test various options ahead of the holiday season, retailers can test a variety of options to gauge customer response, from free returns shipping and flat rate or discounted shipping, to in-store pick up for online orders, a loyalty program benefit, even a discounted express shipping offer for procrastinators.
Want more insight into back-to-school data for 2011? Download full survey results or visit NRF’s back-to-school headquarters.