Is Repeating A/B Tests Worthwhile?

Car wreck

What will the results be when an A/B test is repeated, and is it worthwhile to repeat old tests? The critical question that should be posed is: Why are you expecting different results compared to when the test was run the previous time? Is there any reason to suspect that external factors such as the website or even the entire organization have changed substantially?

When should you repeat old tests?

By taking the question in the introduction as a starting point, you’ll prevent time and energy being spent on repeating old tests when in all probability the results will be the same. This is not to say that in a continual testing process it’s wise to regularly repeat old tests to see if your assumptions are still valid. In some cases it’ll pay off to repeat an old test while in other situations it simply won’t.

Repeat a test when:

  • Previous experiment went wrong
  • Internet has changed
  • Organization has changed
  • Website has changed
  • Results aren’t available anymore

Previous experiment went wrong

broken test tube This will most likely be the most common reason to repeat an experiment. For instance a promoting running alongside the experiment could’ve skewed the results. Also the discontinuing of variations, incorrectly set up goals, incorrect URL targeting or an abnormal mix of traffic sources can bias the results. (Photo credit: Broken test tubes)

Internet has changed

The variation that was tested had no influence on the number of conversions or perhaps even lowered them. However, you’re starting to notice more and more prominent websites showing similar features on their website. Perhaps you were too far ahead of the curve or you’re users had to get accustomed to the changes. This is sometimes called user change aversion. Now that the people have gotten used to the new way of doing things (by regular exposure) this new way might result in positive changes in your conversion rate as well. To prevent such situations in the future one should closely invest the desires, capabilities and knowledge of the target audience.

Organization has changed

Organizations change constantly. This might for example mean that since the last test the products or services offered have changed or that a new price category or market is targeted. The organization could also profile itself using different Unique Selling Points. With all these factors one should consider whether or not there is a significant interaction with the test results. Factors such as lower of higher prices are applicable to all factors after all. However, with an experiment that did or didn’t prominently display ‘Lowest prices’ across the entire website could produce different results when the prices have significantly changed over time.

Website has changed

ugly website Most organizations change their website every couple of years. They will have their websites changed radically in one new release and often do or don’t do this based on the latest web design standards. The results of this is that an experiment that processed a certain set of results in the past might produce entirely different results now. The classic experiment of the Call to Action button color comes to mind. The new CTA color could possibly have lost its contrast with the rest of the design. A good advice with regards to redesigns is to make the evolutionary instead of revolutionary.

Results aren’t available anymore

Perhaps a switch was made to a different A/B testing tool or conversion specialist that runs the experiments. In any event, the original results of the experiment are no longer available. Also when the experiment was ran based on an ambiguous hypothesis, brief conclusions or no documentation of the results at all, repeating the experiment could be very worthwhile. This point also highlights the importance of having strong hypotheses and also emphasizes the need for structured analysis and documentation of experiments.

Don’t repeat a test when:

  • Test quota hasn’t been reached
  • Experiment gave undesirable results
  • Results weren’t significant
  • User interface changes
  • Predecessor ran the experiment

Test quota hasn’t been reached

Some organizations set test quotas. This means they have set a number of tests that have to be run each week, month or year. While I’m a great proponent of testing everything, in my opinion a test quota doesn’t (necessarily) contributes to achieving that goal. In stead of running old tests just to reach the arbitrary test quota, my advice would be to invest this energy into developing new tests.

Experiment gave undesirable results

hippo Experiments that have been set up correctly never produce incorrect results, but often undesired results. CEOs (also called HIPPOs), managers, customers, clients or even conversion optimizers are sometimes hoping for different results than the test produced. When none of the factors listed above are present, there probably isn’t a solid motive to repeat the experiment.

Results weren’t significant

This could have a variety of different reasons. A common cause of non-significant results is that the runtime of the experiment is too short. Sometimes caused by pressure from clients or managers (but also by ignorance of the conversion specialist) many experiments are aborted too early. Another possibility of non-significant results is the lack of a big enough difference in behavior that visitors are showing in the different variations. The article ‘How Long to Run a Test’ discusses this point and related issues in-depth. Lastly, it might be a good idea to first optimize for relevant micro conversions in order to increase the traffic available to complete macro conversions.

User interface changes

Changes in the user interface (especially when none of the criteria listed above are presented) most likely aren’t a good candidate for a repeated test. This criteria applies most when the UX changes are in accordance with current best practices, the test was set up perfectly and a strong significant result was shown for the winning variation.

Predecessor ran the experiment

When this predecessor (or current colleague) hasn’t made any obvious mistakes in setting up the experiment or analyzing the results, in the lack of other factors this probably isn’t a good reason to repeat this experiment.

How to get buy-in?

In most cases it’s hard enough to convince an organization to start testing in the first place. Convincing them to repeat tests is even harder. Arguments such as “haven’t we tested that before?” are quick to appear. How can you handle this and make sure the experiments that match the criteria above are repeated?

Test-driven organization

test driven The advance of a test-driven organization is that everything is up for discussion. Decisions aren’t made based on one person’s opinion or a board of directors, but whenever possible based on data or research. In such an organization the arguments described above should be sufficient to repeat relevant experiments.

You’re the conversion expert

In the past you’ve shown your extensive subject matter expertise and you’ve made well-informed decisions. These decisions have led to interesting new findings, which in turn have improved the conversion rate. When you don’t have such a reputation yet, it would be wise to start working on that first. Reading this blog might get you started there.

Smaller volume of traffic

Often one can get buy-in by lowering the risk connected to that event. A good example is “Let’s test that” in stead of “Let’s put that live”. In this case the threshold to approve the repeated test can be lowered to pushing a smaller volume of traffic through the test. This will decrease the possible negative influence that the test has and therefore lowers the risk involved.

Re-using parts of the old test

In many cases parts of the old test can be reused. For example parts of the design, HTML/JS/CSS code or simply the setup that was used. When screenshots of the variations are saved (you’re doing that right?) rebuilding the experiment (even if no code is available) should be vastly simplified. Because this will save you a considerable amount of resource, getting the buy-in to repeat a test is often far easier.

What results to expect?

The factors above

The results of repeated A/B tests are very difficult to predict accurately. When the experiment matches one or more of the criteria listed below the do/don’t retest headers, with some certainty can be predicted that the test will return the same results as last time. However when the circumstances have changed, predicting the results of the experiment is virtually impossible.

Set it up right this time

Make sure to set up the experiment perfectly when repeating the experiment. For instance, make sure the choice between A/B and MVT is done well, you’ve got a solid hypothesis, an adequate amount of runtime and your goals are set up correctly. This will increase the chances of getting valuable learnings from the experiment.

Conclusion

Some tests are worth repeating while others probably aren’t. By running the intended experiments through the checklist above, it can be decided on which category a certain experiment is. Look up your archive of experiments and find out whether there are tests that are worth repeating soon!


Theo van der Zee

Author: Theo van der Zee

He is the founder of ConversionReview. He has been building and optimizing websites for 15+ years now, and doing so with great success.

On top of his digital skills, Theo is also a trained psychologist and frequent speaker at events around the world.

With UserFeedback you can

Find Out What Your Users Want

UserFeedback logo
  • Decrease your bounce rates
  • Improve user experience
  • Get lower ad costs
Learn more