Some things can be really annoying

Posted on April 16, 2009 at around 9am PDT

You want to run a test. For this purpose you design 10 alternatives that you want to compare, one against the other. You start the test, and wait until you have statistically significant results. Eventually you see that version three is better than the control and provides a significant lift of 5%. You notice that the confidence interval of this version is 3%. Without diving into complex statistics, you understand that it means that the actual lift can probably range between 2% to 8%.

You hope to have at least 5%, maybe settling for a bit less. Then you run this winning version again but this time only versus the control and you expect it to provide the 5% that you saw before. Unfortunately you see only 2%. Why was I so unlucky to see the lower range?

In fact, you weren’t unlucky. This phenomenon is common, and usually referred to as the selection bias problem.

We know that the test has statistical errors embedded within it. We expect them to be symmetrical, which means that we expect that the likelihood for improvement to be similar to the chance for reduction. The trick is that our selection affects the error. We usually select the best performing version, and the best performing version error is usually positive, which means that its’ performance in the test is better than its true performance. On the other hand, the worst version test performance error is usually negative. Sounds like a trick of the statistical Goddess. Is this true?

The comprehensive explanation is difficult. Let’s use a simple example to demonstrate. Suppose that all 10 versions that we test have a true conversion rate of 4%. Due to statistical errors, some will show more than 4% and some will show less during the test. We select the best version in the test that showed 5%. Now we run a second test with this version only (versus the control). In the second test, this version actually will probably be around 4% and we widen the selection bias.

On the other hand if we would select the worst version, we will have a positive surprise, but of course we rarely do that. Pretty annoying isn’t it? We can help address the selection bias problem. More to come in later posts.

Shlomo Lahav
Chief Scientist

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Contact Us

Personal
Details

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Privacy Policy