# The Confidence Interval Fallacy

You will all have seen or heard statements like the following from pollsters during an election campaign:

Support for candidate Joe Bloggs now stands at 43%. The margin of error is plus or minus three percent.

But what exactly does the statement mean? Most people assume it means that the real level of support for Joe Bloggs must lie somewhere between 40 and 46% with 43% being ‘most probable’. But this is wrong, because there is always an unstated ‘level’ of confidence about the margin of error. Typically, the level is assumed to be 95% or 99%. If pushed, a statistician would therefore expand the above statement as something like:

Statement A: Support for candidate Joe Bloggs now stands at 43%. The margin of error is plus or minus three percent, with confidence at the 95% level.

This combination of the margin of error and the level of confidence about it is what statisticians mean by a confidence interval. Unfortunately, even this more complete statement about the confidence interval is highly misleading. That is because most people incorrectly interpret the statement as being about probability, i.e. they mistakenly assume it means something like:

Statement B: There is a 95% probability that support for candidate Joe Bloggs lies between 40 and 46%.

Statement B is a statement about the probability of the unknown population mean P. Most problems of statistical inference boil down to trying to find out such ‘unknowns’ given observed data.  However, there is a fundamental difference between the frequentist approach and the Bayesian approach to probability that was discussed  here.  Whereas a statement about the probability of an unknown value is natural for Bayesians, it is simply not allowed (because it has no meaning) in the frequentist approach. Instead, the frequentists use the confidence interval approach of statement A, which is not a statement of probability in the sense of Statement B.

It turns out that confidence intervals, as in Statement A, are really rather complex to define and understand properly – if you look at standard statistical textbooks on the subject you will see what I mean. So I will now attempt a proper explanation that is as un-technical as possible.

Being a standard tool of frequentist statisticians the confidence interval actually involves the idea of a repeated experiment, like selecting balls repeatedly from an urn. Suppose, for example, that an urn contains 100,000 balls each of which is either blue or white. We want to find out the percentage (P) of white balls in the urn from a sample of size 100.  The previous polling example is essentially the same – the equivalent of the ‘urn’ is the set of all voters and the equivalent of a ‘white ball’ is a voter who votes for Joe Bloggs. The frequentist approach to this problem is to imagine that we could repeat the sampling many times (that is, to determine what happens ‘in the long run’), each time counting the percentage of white balls and adding plus or minus 3 to create an interval. So imagine a long sequence of sample intervals:

[39-45], [41-46], [43-48], [44-49], [42-47], [41-46], [43-48], [39-45], [38-44], [44-49], …

The 95% confidence interval actually means that ‘in the long run’ 95% of these intervals contain the population proportion P.

Now while that is the technically correct definition of a confidence interval it does not shed any light on how statisticians actually calculate confidence intervals. After all, the whole point about taking a sample is

1. You don’t know what P is – you want the sample to help you find out
2. You can’t afford to do ‘long runs’ of samples – the whole point of a sample is that you only take one

And this is where things get weird. It turns out that, in order to turn your sample proportion into a confidence interval about the (unknown) population proportion P statisticians have to make certain kinds of assumptions about both the nature of the population and the value of the unknown P. This is weird because frequentists feel uncomfortable about the Bayesian approach precisely because of having to make similar kinds of ‘prior’ assumptions.

The article continues in more detail here….