Colin White

📈 A/B testing with limited data at a startup

How to do A/B testing with limited data at a startup

Practically every article you read about startup marketing stresses the importance of A/B testing. From the header on your landing page to the colour of your signup button, even the most minor thing should be tested. But, almost all of these articles you’re reading assume one thing. That you’re in growth mode.

But that’s an issue for a lot of companies.

Startups that haven’t hit growth just don’t have enough data to run A/B tests at scale. This means that testing at low numbers is a completely different ballpark than when you are scaling. I’m not saying that it’s impossible, but I’ll break down some common misconceptions and mistakes that are easy to make when testing with small amounts of data.

Let’s quickly define A/B testing

A marketing experiment where two variations of a landing page, ad, email or other piece of online content are pitted against each other to determine which produces the highest conversion rate.

That’s the definition from a marketers point of view. Pit two pieces of marketing material against each other to see which converts better. This definition doesn’t do justice to the statistical methods behind A/B testing.

When you run an A/B test, you are most likely doing a comparison of two binomial distributions using some type of statistical test (there are a few possibilities).

Things like click-through rates or conversion rates are binomial distributions.

If you are using A/B testing software like Optimizely, a lot of these statistics are hidden from you, which can sometimes be to your detriment.

A/B testing pitfalls

Testing software makes running experiments approachable to all types of people. You really don’t have to know much about statistics to pull off an A/B test in Google Optimize or Optimizely. But not having much knowledge in stats can lead you down the path to some easy to avoid mistakes.

Let’s look at a few.

Ending your test too early

If you’re like me, you LOVE watching the numbers go up when you’re running experiments (or ads, or anything). And one of the best numbers to watch is what most A/B testing software has now. The “chance to beat” metric. This is a number that the software calculates on the fly using a multitude of variables available.

This number can be misleading. If your “chance to beat” is at 100% you might be thinking, “alright, let’s end this and start the next test”, but that can get you into major trouble. This is called peeking. Peeking is when you look at the data before you’ve gathered a big enough sample size and means you haven’t hit statistical significance. Following the process of ending your tests early can lead to false positives that aren’t going to be correct in the long run.

Running A/B tests takes a lot of traffic, especially as you get further into your funnel. Users drop off and you get less and less people seeing your test.

Let’s look at an example. Say you have a conversion rate on a landing page of 5% and you want to get it up to 5.5%. That’s a 10% increase. Seems pretty reasonable to get there. But to be confident in that change to the landing page, you’ll need each variation to get a sample size of ~30000 visitors.

I don’t know about you, but getting ~60k visitors to a landing page can be pretty tough when you are early in your startup’s lifecycle.

Checkout Evan Miller’s awesome sample size calculator to better understand the audience size you need: https://www.evanmiller.org/ab-testing/sample-size.html

Not understanding your audience

Not all visitors to your website are the same. As marketers & founders, we know motivation and intent are key to marketing proficiently.

A/B testing is the same. If you are running a test that needs 10000 sessions per variation and all of a sudden get 20k hits to your landing page because a blog post blew up. Is that really a good sense of what your conversion rate is going to be? Probably not. You need to make sure you understand who is visiting your site before you can make a call.

Sometimes it’s worth running a test to a higher significance if you’re not confident you’re sample has a decent diversity.

Testing all the small things

I hope you don’t have blink 182 in your head now.

I’ve talked a lot about the sample size that’s needed to run tests. In some cases it’s huge. And one of the biggest contributors to your sample size needed is the percent lift you want to see.

Let’s go back to the example above of our 5% -> 5.5% conversion rate. Testing to significance for that example, we would need to have around 30k visitors per variation. But say we wanted to see that conversion rate lift to 6% instead by changing the whole landing page and not just the CTA copy. That drops our visitors needed per variation down to ~12k. That’s half the visitors and a much more achievable number.

Making large changes to whatever you are testing is important when you you can only muster a small sample set. If you are changing something small, you won’t be hypothesizing a large lift in your test variable. But, if you make large changes to your content and go for a higher change in the test variable, your sample size needed will decrease dramatically.

Don’t just go and boost your goal lift to a crazy amount here. You still need to think of testing in a scientific way and choose all of these metrics based on a good hypothesis. Otherwise you are never going to hit significance and you’ll never learn anything.

Testing is hard

And it’s even harder at a startup. Tools like Optimizely and Google Optimize are making it easier and easier to run experiments, but they don’t give you all of the background math that needs to be done. Not knowing those statistical methods that are running in the background can lead you to false positives.

Make sure you’re setting your sample size before your test and sticking to it. Don’t call a test done just because it looks like it’s going to win. Remember, data trumps gut if you have it. Watch out for traffic spikes from one spot, it can make your audience less diverse and skew your results to the max. And lastly, don’t test the small things when your throughput is low. If you’re looking to see significant results at low sample sizes, make big changes. Switch out the landing page for a completely different one, or completely change up the ad you’re running, not just a few words.

StratusUpdate

Keep your head in the cloud with Stratus Update

With our new weekly newsletter we’ll keep you up to date with a curated selection of the the latest cloud services, projects and best practices.
Click here to read the latest issue.